[ https://issues.apache.org/jira/browse/SPARK-30681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-30681: --------------------------------- Fix Version/s: (was: 3.1.1) 3.1.0 > Add higher order functions API to PySpark > ----------------------------------------- > > Key: SPARK-30681 > URL: https://issues.apache.org/jira/browse/SPARK-30681 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL > Affects Versions: 3.0.0 > Reporter: Maciej Szymkiewicz > Assignee: Maciej Szymkiewicz > Priority: Major > Fix For: 3.1.0 > > > As of 3.0.0 higher order functions are available in SQL and Scala, but not in > PySpark, forcing Python users to invoke these through {{expr}}, > {{selectExpr}} or {{sql}}. > This is error prone and not well documented. Spark should provide > {{pyspark.sql}} wrappers that accept plain Python functions (of course within > limits of {{(*Column) -> Column}}) as arguments. > {code:python} > df.select(transform("values", lambda c: trim(upper(c))) > def increment_values(k: Column, v: Column) -> Column: > return v + 1 > df.select(transform_values("data"), increment_values) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org