Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18906#discussion_r163757006 --- Diff: python/pyspark/sql/functions.py --- @@ -2264,6 +2272,16 @@ def pandas_udf(f=None, returnType=None, functionType=None): ... return pd.Series(np.random.randn(len(v)) >>> random = random.asNondeterministic() # doctest: +SKIP + .. note:: The user-defined functions are considered to be able to return null values by default. + If your function is not nullable, call `asNonNullable` on the user defined function. + E.g.: + + >>> @pandas_udf('string', PandasUDFType.SCALAR) # doctest: +SKIP + ... def get_user(v): + ... import getpass as gp + ... return gp.getuser() --- End diff -- I don't think this is quite right example. Correct and better one should look like this: ```python @pandas_udf("string") def foo(s): import getpass import pandas return pandas.Series(getpass.getuser()).repeat(s.size) ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org