[ https://issues.apache.org/jira/browse/SPARK-33221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aoyuan Liao resolved SPARK-33221. --------------------------------- Resolution: Not A Problem > UDF in when operation applied in all the rows regardless the condition > ---------------------------------------------------------------------- > > Key: SPARK-33221 > URL: https://issues.apache.org/jira/browse/SPARK-33221 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.4.5, 3.0.1 > Reporter: Marco De Nadai > Priority: Major > > Hi all, > I think there is a bug or, at least, an undocumented behaviour of pyspark > UDFs. The code here is trying to apply the UDF just for a subset of rows > (convenient for long dataframes). > > {code:java} > @udf(returnType=BooleanType()) > def test_udf(x): > if x is None: > raise Exception(x) > return True > > data = [(1,11,1),(1,22,2),(1,33,3),(2,44,1),(3,55,1),(4,66,1)] > dataColumns = ["uid","price","day"] > test = spark.createDataFrame(data=data, schema = dataColumns)w = > Window.partitionBy('uid').orderBy('uid','day') > test = test.withColumn('lag_price', F.lead(F.col('price')).over(w)) > print(test.dtypes) > test = test.withColumn('condition', F.col('lag_price').isNotNull()) > test.withColumn('appliedUDF', F.when(F.col('condition'), > test_udf(F.col('lag_price'))).otherwise(False)).show() > {code} > It throws this error: > {code:java} > File "<command-3513778682084612>", line 4, in test_udf > Exception: None > {code} > Is it normal? Am I missing something? > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org