Thank you for the proposal. I'm wondering if we are going to consider them as release blockers or not.
In general, I don't think those SQL functions should be available in all languages as release blockers. (Especially in R or new Spark Connect languages like Go and Rust). If they are not release blockers, we may allow some existing or future community PRs only before feature freeze (= branch cut). Thanks, Dongjoon. On Wed, May 24, 2023 at 7:09 PM Jia Fan <fan...@apache.org> wrote: > +1 > It is important that different APIs can be used to call the same function > > Ryan Berti <rbe...@netflix.com.invalid> 于2023年5月25日周四 01:48写道: > >> During my recent experience developing functions, I found that >> identifying locations (sql + connect functions.scala + functions.py, >> FunctionRegistry, + whatever is required for R) and standards for adding >> function signatures was not straight forward (should you use optional args >> or overload functions? which col/lit helpers should be used when?). Are >> there docs describing all of the locations + standards for defining a >> function? If not, that'd be great to have too. >> >> Ryan Berti >> >> Senior Data Engineer | Ads DE >> >> M 7023217573 >> >> 5808 W Sunset Blvd | Los Angeles, CA 90028 >> >> >> >> On Wed, May 24, 2023 at 12:44 AM Enrico Minack <i...@enrico.minack.dev> >> wrote: >> >>> +1 >>> >>> Functions available in SQL (more general in one API) should be available >>> in all APIs. I am very much in favor of this. >>> >>> Enrico >>> >>> >>> Am 24.05.23 um 09:41 schrieb Hyukjin Kwon: >>> >>> Hi all, >>> >>> I would like to discuss adding all SQL functions into Scala, Python and >>> R API. >>> We have SQL functions that do not exist in Scala, Python and R around >>> 175. >>> For example, we don’t have pyspark.sql.functions.percentile but you can >>> invoke >>> it as a SQL function, e.g., SELECT percentile(...). >>> >>> The reason why we do not have all functions in the first place is that >>> we want to >>> only add commonly used functions, see also >>> https://github.com/apache/spark/pull/21318 (which I agreed at that time) >>> >>> However, this has been raised multiple times over years, from the OSS >>> community, dev mailing list, JIRAs, stackoverflow, etc. >>> Seems it’s confusing about which function is available or not. >>> >>> Yes, we have a workaround. We can call all expressions by expr("...") >>> or call_udf("...", Columns ...) >>> But still it seems that it’s not very user-friendly because they expect >>> them available under the functions namespace. >>> >>> Therefore, I would like to propose adding all expressions into all >>> languages so that Spark is simpler and less confusing, e.g., which API is >>> in functions or not. >>> >>> Any thoughts? >>> >>> >>>