Hi all,

I would like to discuss adding all SQL functions into Scala, Python and R
API.
We have SQL functions that do not exist in Scala, Python and R around 175.
For example, we don’t have pyspark.sql.functions.percentile but you can
invoke
it as a SQL function, e.g., SELECT percentile(...).

The reason why we do not have all functions in the first place is that we
want to
only add commonly used functions, see also
https://github.com/apache/spark/pull/21318 (which I agreed at that time)

However, this has been raised multiple times over years, from the OSS
community, dev mailing list, JIRAs, stackoverflow, etc.
Seems it’s confusing about which function is available or not.

Yes, we have a workaround. We can call all expressions by expr("...")
or call_udf("...",
Columns ...)
But still it seems that it’s not very user-friendly because they expect
them available under the functions namespace.

Therefore, I would like to propose adding all expressions into all
languages so that Spark is simpler and less confusing, e.g., which API is
in functions or not.

Any thoughts?

Reply via email to