Jim Fulton created SPARK-28978:
----------------------------------

             Summary: PySpark: Can't pass more than 256 arguments to a UDF
                 Key: SPARK-28978
                 URL: https://issues.apache.org/jira/browse/SPARK-28978
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.4.0
            Reporter: Jim Fulton


This code:

[https://github.com/apache/spark/blob/712874fa0937f0784f47740b127c3bab20da8569/python/pyspark/worker.py#L367-L379]

Creates Python lambdas that call UDF functions passing arguments singly, rather 
than using varargs.  For example: `lambda a: f(a[0], a[1], ...)`.

This fails when there are more than 256 arguments.

mlflow, when generating model predictions, uses an argument for each feature 
column.  I have a model with > 500 features.

I was able to easily hack around this by changing the generated lambdas to use 
varargs, as in `lambda a: f(*a)`. 

IDK why these lambdas were created the way they were.  Using varargs is much 
simpler and works fine in my testing.

 

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to