Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r191113977 --- Diff: python/pyspark/util.py --- @@ -89,6 +93,33 @@ def majorMinorVersion(sparkVersion): " version numbers.") +def fail_on_stopiteration(f): + """ + Wraps the input function to fail on 'StopIteration' by raising a 'RuntimeError' + prevents silent loss of data when 'f' is used in a for loop + """ + def wrapper(*args, **kwargs): + try: + return f(*args, **kwargs) + except StopIteration as exc: + raise RuntimeError( + "Caught StopIteration thrown from user's code; failing the task", + exc + ) + + # prevent inspect to fail + # e.g. inspect.getargspec(sum) raises + # TypeError: <built-in function sum> is not a Python function + try: + argspec = _get_argspec(f) --- End diff -- You said to do it in `udf.UserDefinedFunction._create_judf`, but sent the code of `udf._create_udf`. I assume you meant the former, since we cannot do that in `_create_udf` (`UserDefinedFunction._wrapped` needs the original function for its documentation and other stuff). I will also simplify the code as you suggested, yes
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org