Jim Fulton created SPARK-28978: ---------------------------------- Summary: PySpark: Can't pass more than 256 arguments to a UDF Key: SPARK-28978 URL: https://issues.apache.org/jira/browse/SPARK-28978 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.4.0 Reporter: Jim Fulton
This code: [https://github.com/apache/spark/blob/712874fa0937f0784f47740b127c3bab20da8569/python/pyspark/worker.py#L367-L379] Creates Python lambdas that call UDF functions passing arguments singly, rather than using varargs. For example: `lambda a: f(a[0], a[1], ...)`. This fails when there are more than 256 arguments. mlflow, when generating model predictions, uses an argument for each feature column. I have a model with > 500 features. I was able to easily hack around this by changing the generated lambdas to use varargs, as in `lambda a: f(*a)`. IDK why these lambdas were created the way they were. Using varargs is much simpler and works fine in my testing. -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org