Re: [DISCUSS] Add SQL functions into Scala, Python and R API

Hyukjin Kwon Sun, 28 May 2023 17:12:35 -0700

Yes, some were cases like you mentioned.
But I found myself explaining that reason to a lot of people, not only
developers but users - I was asked in a conference, email, slack,
internally and externally.
Then realised that maybe we're doing something wrong. This is based on my
experience so I wanted to open a discussion and see what others think about
this :-).





On Sat, 27 May 2023 at 00:19, Maciej <mszymkiew...@gmail.com> wrote:

> Weren't some of these functions provided only for compatibility  and
> intentionally left out of the language APIs?
>
> --
> Best regards,
> Maciej
>
> On 5/25/23 23:21, Hyukjin Kwon wrote:
>
> I don't think it'd be a release blocker .. I think we can implement them
> across multiple releases.
>
> On Fri, May 26, 2023 at 1:01 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
> wrote:
>
>> Thank you for the proposal.
>>
>> I'm wondering if we are going to consider them as release blockers or not.
>>
>> In general, I don't think those SQL functions should be available in all
>> languages as release blockers.
>> (Especially in R or new Spark Connect languages like Go and Rust).
>>
>> If they are not release blockers, we may allow some existing or future
>> community PRs only before feature freeze (= branch cut).
>>
>> Thanks,
>> Dongjoon.
>>
>>
>> On Wed, May 24, 2023 at 7:09 PM Jia Fan <fan...@apache.org> wrote:
>>
>>> +1
>>> It is important that different APIs can be used to call the same function
>>>
>>> Ryan Berti <rbe...@netflix.com.invalid> <rbe...@netflix.com.invalid>
>>> 于2023年5月25日周四 01:48写道：
>>>
>>>> During my recent experience developing functions, I found that
>>>> identifying locations (sql + connect functions.scala + functions.py,
>>>> FunctionRegistry, + whatever is required for R) and standards for adding
>>>> function signatures was not straight forward (should you use optional args
>>>> or overload functions? which col/lit helpers should be used when?). Are
>>>> there docs describing all of the locations + standards for defining a
>>>> function? If not, that'd be great to have too.
>>>>
>>>> Ryan Berti
>>>>
>>>> Senior Data Engineer  |  Ads DE
>>>>
>>>> M 7023217573
>>>>
>>>> 5808 W Sunset Blvd  |  Los Angeles, CA 90028
>>>> <https://www.google.com/maps/search/5808+W+Sunset+Blvd%C2%A0+%7C%C2%A0+Los+Angeles,+CA+90028?entry=gmail&source=g>
>>>>
>>>>
>>>>
>>>> On Wed, May 24, 2023 at 12:44 AM Enrico Minack <i...@enrico.minack.dev>
>>>> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> Functions available in SQL (more general in one API) should be
>>>>> available in all APIs. I am very much in favor of this.
>>>>>
>>>>> Enrico
>>>>>
>>>>>
>>>>> Am 24.05.23 um 09:41 schrieb Hyukjin Kwon:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I would like to discuss adding all SQL functions into Scala, Python
>>>>> and R API.
>>>>> We have SQL functions that do not exist in Scala, Python and R around
>>>>> 175.
>>>>> For example, we don’t have pyspark.sql.functions.percentile but you
>>>>> can invoke
>>>>> it as a SQL function, e.g., SELECT percentile(...).
>>>>>
>>>>> The reason why we do not have all functions in the first place is that
>>>>> we want to
>>>>> only add commonly used functions, see also
>>>>> https://github.com/apache/spark/pull/21318 (which I agreed at that
>>>>> time)
>>>>>
>>>>> However, this has been raised multiple times over years, from the OSS
>>>>> community, dev mailing list, JIRAs, stackoverflow, etc.
>>>>> Seems it’s confusing about which function is available or not.
>>>>>
>>>>> Yes, we have a workaround. We can call all expressions by expr("...")
>>>>>  or call_udf("...", Columns ...)
>>>>> But still it seems that it’s not very user-friendly because they
>>>>> expect them available under the functions namespace.
>>>>>
>>>>> Therefore, I would like to propose adding all expressions into all
>>>>> languages so that Spark is simpler and less confusing, e.g., which API is
>>>>> in functions or not.
>>>>>
>>>>> Any thoughts?
>>>>>
>>>>>
>>>>>
>

Re: [DISCUSS] Add SQL functions into Scala, Python and R API

Reply via email to