Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18659#discussion_r139580569 --- Diff: python/pyspark/worker.py --- @@ -71,7 +73,19 @@ def wrap_udf(f, return_type): return lambda *a: f(*a) -def read_single_udf(pickleSer, infile): +def wrap_pandas_udf(f, return_type): + def verify_result_length(*a): + kwargs = a[-1] + result = f(*a[:-1], **kwargs) + if len(result) != kwargs["length"]: + raise RuntimeError("Result vector from pandas_udf was not the required length: " + "expected %d, got %d\nUse input vector length or kwarg['length']" + % (kwargs["length"], len(result))) + return result, toArrowType(return_type) --- End diff -- Can we move `toArrowType(return_type)` out of `verify_result_length` to avoid calculating it for each block?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org