Github user mstewart141 commented on the issue: https://github.com/apache/spark/pull/20900 Many (though not all, I don't think `callable`s are impacted) of the limitations of pandas_udf relative to UDF in this domain are due to the fact that `pandas_udf` doesn't allow for keyword arguments at the call site. This obviously impacts plain old function-based `pandas_udf`s but also partial fns, where one would typically need to specify the argument (that one was partially applying) by name. In the incremental commits of this PR as at: https://github.com/apache/spark/pull/20900/commits/9ea2595f0cecb0cd05e0e6b99baf538679332e8b You can see the kind of things I was investigating to try and fix that case. Indeed my original PR was (ambitiously) titled something about enabling kw args for all pandas_udfs. This is actually very easy to do for *functions* on python3 (and maybe plain functions in py2 also, but I suspect that this is also rather tricky as `getargspec` is pretty unhelpful when it comes to some of the kw-arg metadata one would need)). However, it is rather harder for the partial function case as one quickly gets into stacktraces from places like `python/pyspark/worker.py` where the semantics of the current strategy do not realize that a column from the arguments list may already be "accounted for" and one runs into duplicate arguments passed for `a`, e.g., as a result of this. My summary is that the change to allow kw for functions is simple (at least in py3 -- indeed my incremental commit referenced above does this), but for partial fns maybe not so much. I'm pretty confident I'm most of the way to accomplishing the former, but not that latter. However, I have no substantial knowledge of the pyspark codebase so you will likely have better luck there, should you go down that route :) **TL;DR**: I could work on a PR to allow keyword arguments for python3 functions (not partials, and not py2), but that is likely too narrow a goal given the broader context. One general question: how do we tend to think about the py2/3 split for api quirks/features? Must everything that is added for py3 also be functional in py2?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org