Bjørn Jørgensen created SPARK-38763:
---------------------------------------

             Summary: Pandas API on spark Can`t apply lamda to columns.  
                 Key: SPARK-38763
                 URL: https://issues.apache.org/jira/browse/SPARK-38763
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.3.0, 3.4.0
            Reporter: Bjørn Jørgensen


When I use a spark master build from 08 November 21 I can use this code to 
rename columns 

{code:java}
pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', x))
pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x))
pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x))
{code}

But now after I get this error when I use this code.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', x))
      2 pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x))
      3 pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x))

File /opt/spark/python/pyspark/pandas/frame.py:10636, in DataFrame.rename(self, 
mapper, index, columns, axis, inplace, level, errors)
  10632     index_mapper_fn, index_mapper_ret_dtype, index_mapper_ret_stype = 
gen_mapper_fn(
  10633         index
  10634     )
  10635 if columns:
> 10636     columns_mapper_fn, _, _ = gen_mapper_fn(columns)
  10638 if not index and not columns:
  10639     raise ValueError("Either `index` or `columns` should be provided.")

File /opt/spark/python/pyspark/pandas/frame.py:10603, in 
DataFrame.rename.<locals>.gen_mapper_fn(mapper)
  10601 elif callable(mapper):
  10602     mapper_callable = cast(Callable, mapper)
> 10603     return_type = cast(ScalarType, infer_return_type(mapper))
  10604     dtype = return_type.dtype
  10605     spark_return_type = return_type.spark_type

File /opt/spark/python/pyspark/pandas/typedef/typehints.py:563, in 
infer_return_type(f)
    560 tpe = get_type_hints(f).get("return", None)
    562 if tpe is None:
--> 563     raise ValueError("A return value is required for the input 
function")
    565 if hasattr(tpe, "__origin__") and issubclass(tpe.__origin__, 
SeriesType):
    566     tpe = tpe.__args__[0]

ValueError: A return value is required for the input function






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to