[ https://issues.apache.org/jira/browse/SPARK-19160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maciej Szymkiewicz updated SPARK-19160: --------------------------------------- Affects Version/s: (was: 1.5.0) > Decorator for UDF creation. > --------------------------- > > Key: SPARK-19160 > URL: https://issues.apache.org/jira/browse/SPARK-19160 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL > Affects Versions: 1.6.0, 2.0.0, 2.1.0, 2.2.0 > Reporter: Maciej Szymkiewicz > > Right now there are a few ways we can create UDF: > - With standalone function: > {code} > def _add_one(x): > """Adds one""" > if x is not None: > return x + 1 > add_one = udf(_add_one, IntegerType()) > {code} > This allows for full control flow, including exception handling, but > duplicates variables. > > - With `lambda` expression: > {code} > add_one = udf(lambda x: x + 1 if x is not None else None, IntegerType()) > {code} > No variable duplication but only pure expressions. > - Using nested functions with immediate call: > {code} > def add_one(c): > def add_one_(x): > if x is not None: > return x + 1 > return udf(add_one_, IntegerType())(c) > {code} > Quite verbose but enables full control flow and clearly indicates expected > number of arguments. > > - Using `udf` functions as a decorator: > {code} > @udf > def add_one(x): > """Adds one""" > if x is not None: > return x + 1 > {code} > Possible but only with default `returnType` (or curried `@partial(udf, > returnType=IntegerType())`). > > Proposed > Add `udf` decorator which can be used as follows: > {code} > from pyspark.sql.decorators import udf > @udf(IntegerType()) > def add_one(x): > """Adds one""" > if x is not None: > return x + 1 > {code} > or > {code} > @udf() > def strip(x): > """Strips String""" > if x is not None: > return x.strip() > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org