Yes, some were cases like you mentioned. But I found myself explaining that reason to a lot of people, not only developers but users - I was asked in a conference, email, slack, internally and externally. Then realised that maybe we're doing something wrong. This is based on my experience so I wanted to open a discussion and see what others think about this :-).
On Sat, 27 May 2023 at 00:19, Maciej <mszymkiew...@gmail.com> wrote: > Weren't some of these functions provided only for compatibility and > intentionally left out of the language APIs? > > -- > Best regards, > Maciej > > On 5/25/23 23:21, Hyukjin Kwon wrote: > > I don't think it'd be a release blocker .. I think we can implement them > across multiple releases. > > On Fri, May 26, 2023 at 1:01 AM Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > >> Thank you for the proposal. >> >> I'm wondering if we are going to consider them as release blockers or not. >> >> In general, I don't think those SQL functions should be available in all >> languages as release blockers. >> (Especially in R or new Spark Connect languages like Go and Rust). >> >> If they are not release blockers, we may allow some existing or future >> community PRs only before feature freeze (= branch cut). >> >> Thanks, >> Dongjoon. >> >> >> On Wed, May 24, 2023 at 7:09 PM Jia Fan <fan...@apache.org> wrote: >> >>> +1 >>> It is important that different APIs can be used to call the same function >>> >>> Ryan Berti <rbe...@netflix.com.invalid> <rbe...@netflix.com.invalid> >>> 于2023年5月25日周四 01:48写道: >>> >>>> During my recent experience developing functions, I found that >>>> identifying locations (sql + connect functions.scala + functions.py, >>>> FunctionRegistry, + whatever is required for R) and standards for adding >>>> function signatures was not straight forward (should you use optional args >>>> or overload functions? which col/lit helpers should be used when?). Are >>>> there docs describing all of the locations + standards for defining a >>>> function? If not, that'd be great to have too. >>>> >>>> Ryan Berti >>>> >>>> Senior Data Engineer | Ads DE >>>> >>>> M 7023217573 >>>> >>>> 5808 W Sunset Blvd | Los Angeles, CA 90028 >>>> <https://www.google.com/maps/search/5808+W+Sunset+Blvd%C2%A0+%7C%C2%A0+Los+Angeles,+CA+90028?entry=gmail&source=g> >>>> >>>> >>>> >>>> On Wed, May 24, 2023 at 12:44 AM Enrico Minack <i...@enrico.minack.dev> >>>> wrote: >>>> >>>>> +1 >>>>> >>>>> Functions available in SQL (more general in one API) should be >>>>> available in all APIs. I am very much in favor of this. >>>>> >>>>> Enrico >>>>> >>>>> >>>>> Am 24.05.23 um 09:41 schrieb Hyukjin Kwon: >>>>> >>>>> Hi all, >>>>> >>>>> I would like to discuss adding all SQL functions into Scala, Python >>>>> and R API. >>>>> We have SQL functions that do not exist in Scala, Python and R around >>>>> 175. >>>>> For example, we don’t have pyspark.sql.functions.percentile but you >>>>> can invoke >>>>> it as a SQL function, e.g., SELECT percentile(...). >>>>> >>>>> The reason why we do not have all functions in the first place is that >>>>> we want to >>>>> only add commonly used functions, see also >>>>> https://github.com/apache/spark/pull/21318 (which I agreed at that >>>>> time) >>>>> >>>>> However, this has been raised multiple times over years, from the OSS >>>>> community, dev mailing list, JIRAs, stackoverflow, etc. >>>>> Seems it’s confusing about which function is available or not. >>>>> >>>>> Yes, we have a workaround. We can call all expressions by expr("...") >>>>> or call_udf("...", Columns ...) >>>>> But still it seems that it’s not very user-friendly because they >>>>> expect them available under the functions namespace. >>>>> >>>>> Therefore, I would like to propose adding all expressions into all >>>>> languages so that Spark is simpler and less confusing, e.g., which API is >>>>> in functions or not. >>>>> >>>>> Any thoughts? >>>>> >>>>> >>>>> >