[ https://issues.apache.org/jira/browse/SPARK-38763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516172#comment-17516172 ]
Xinrong Meng commented on SPARK-38763: -------------------------------------- Hi [~bjornjorgensen], thanks for raising that! The workaround is to use a function with a return type rather than a lambda. I am fixing this in https://issues.apache.org/jira/browse/SPARK-38766. > Pandas API on spark Can`t apply lamda to columns. > --------------------------------------------------- > > Key: SPARK-38763 > URL: https://issues.apache.org/jira/browse/SPARK-38763 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.3.0, 3.4.0 > Reporter: Bjørn Jørgensen > Priority: Major > > When I use a spark master build from 08 November 21 I can use this code to > rename columns > {code:java} > pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > {code} > But now after I get this error when I use this code. > --------------------------------------------------------------------------- > ValueError Traceback (most recent call last) > Input In [5], in <cell line: 1>() > ----> 1 pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', > x)) > 2 pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x)) > 3 pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x)) > File /opt/spark/python/pyspark/pandas/frame.py:10636, in > DataFrame.rename(self, mapper, index, columns, axis, inplace, level, errors) > 10632 index_mapper_fn, index_mapper_ret_dtype, index_mapper_ret_stype = > gen_mapper_fn( > 10633 index > 10634 ) > 10635 if columns: > > 10636 columns_mapper_fn, _, _ = gen_mapper_fn(columns) > 10638 if not index and not columns: > 10639 raise ValueError("Either `index` or `columns` should be > provided.") > File /opt/spark/python/pyspark/pandas/frame.py:10603, in > DataFrame.rename.<locals>.gen_mapper_fn(mapper) > 10601 elif callable(mapper): > 10602 mapper_callable = cast(Callable, mapper) > > 10603 return_type = cast(ScalarType, infer_return_type(mapper)) > 10604 dtype = return_type.dtype > 10605 spark_return_type = return_type.spark_type > File /opt/spark/python/pyspark/pandas/typedef/typehints.py:563, in > infer_return_type(f) > 560 tpe = get_type_hints(f).get("return", None) > 562 if tpe is None: > --> 563 raise ValueError("A return value is required for the input > function") > 565 if hasattr(tpe, "__origin__") and issubclass(tpe.__origin__, > SeriesType): > 566 tpe = tpe.__args__[0] > ValueError: A return value is required for the input function -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org