[ https://issues.apache.org/jira/browse/SPARK-19161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
holdenk resolved SPARK-19161. ----------------------------- Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 16534 [https://github.com/apache/spark/pull/16534] > Improving UDF Docstrings > ------------------------ > > Key: SPARK-19161 > URL: https://issues.apache.org/jira/browse/SPARK-19161 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL > Affects Versions: 1.6.0, 2.0.0, 2.1.0, 2.2.0 > Reporter: Maciej Szymkiewicz > Fix For: 2.2.0 > > > Current state > Right now `udf` returns an `UserDefinedFunction` object which doesn't provide > meaningful docstring: > {code} > In [1]: from pyspark.sql.types import IntegerType > In [2]: from pyspark.sql.functions import udf > In [3]: def _add_one(x): > """Adds one""" > if x is not None: > return x + 1 > ...: > In [4]: add_one = udf(_add_one, IntegerType()) > In [5]: ?add_one > Type: UserDefinedFunction > String form: <pyspark.sql.functions.UserDefinedFunction object at > 0x7f281ed2d198> > File: ~/Spark/spark-2.0/python/pyspark/sql/functions.py > Signature: add_one(*cols) > Docstring: > User defined function in Python > .. versionadded:: 1.3 > In [6]: help(add_one) > Help on UserDefinedFunction in module pyspark.sql.functions object: > class UserDefinedFunction(builtins.object) > | User defined function in Python > | > | .. versionadded:: 1.3 > | > | Methods defined here: > | > | __call__(self, *cols) > | Call self as a function. > | > | __del__(self) > | > | __init__(self, func, returnType, name=None) > | Initialize self. See help(type(self)) for accurate signature. > | > | ---------------------------------------------------------------------- > | Data descriptors defined here: > | > | __dict__ > | dictionary for instance variables (if defined) > | > | __weakref__ > | list of weak references to the object (if defined) > (END) > {code} > It is possible to extract the function: > {code} > In [7]: ?add_one.func > Signature: add_one.func(x) > Docstring: Adds one > File: ~/Spark/spark-2.0/<ipython-input-3-d2d8e4c530ac> > Type: function > In [8]: help(add_one.func) > Help on function _add_one in module __main__: > _add_one(x) > Adds one > {code} > but it assumes that the final user is aware of the distinction between UDF > and built-in functions. > Proposed > Copy input functions docstring to the UDF object or function wrapper. > {code} > In [1]: from pyspark.sql.types import IntegerType > In [2]: from pyspark.sql.functions import udf > In [3]: def _add_one(x): > """Adds one""" > if x is not None: > return x + 1 > ...: > In [4]: add_one = udf(_add_one, IntegerType()) > In [5]: ?add_one > Signature: add_one(x) > Docstring: > Adds one > SQL Type: IntegerType > File: ~/Workspace/spark/<ipython-input-3-d2d8e4c530ac> > Type: function > In [6]: help(add_one) > Help on function _add_one in module __main__: > _add_one(x) > Adds one > > SQL Type: IntegerType > (END) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org