Re: [DISCUSS] Add SQL functions into Scala, Python and R API

Dongjoon Hyun Thu, 25 May 2023 09:00:36 -0700

Thank you for the proposal.

I'm wondering if we are going to consider them as release blockers or not.


In general, I don't think those SQL functions should be available in all
languages as release blockers.
(Especially in R or new Spark Connect languages like Go and Rust).

If they are not release blockers, we may allow some existing or future
community PRs only before feature freeze (= branch cut).

Thanks,
Dongjoon.


On Wed, May 24, 2023 at 7:09 PM Jia Fan <fan...@apache.org> wrote:

> +1
> It is important that different APIs can be used to call the same function
>
> Ryan Berti <rbe...@netflix.com.invalid> 于2023年5月25日周四 01:48写道：
>
>> During my recent experience developing functions, I found that
>> identifying locations (sql + connect functions.scala + functions.py,
>> FunctionRegistry, + whatever is required for R) and standards for adding
>> function signatures was not straight forward (should you use optional args
>> or overload functions? which col/lit helpers should be used when?). Are
>> there docs describing all of the locations + standards for defining a
>> function? If not, that'd be great to have too.
>>
>> Ryan Berti
>>
>> Senior Data Engineer  |  Ads DE
>>
>> M 7023217573
>>
>> 5808 W Sunset Blvd  |  Los Angeles, CA 90028
>>
>>
>>
>> On Wed, May 24, 2023 at 12:44 AM Enrico Minack <i...@enrico.minack.dev>
>> wrote:
>>
>>> +1
>>>
>>> Functions available in SQL (more general in one API) should be available
>>> in all APIs. I am very much in favor of this.
>>>
>>> Enrico
>>>
>>>
>>> Am 24.05.23 um 09:41 schrieb Hyukjin Kwon:
>>>
>>> Hi all,
>>>
>>> I would like to discuss adding all SQL functions into Scala, Python and
>>> R API.
>>> We have SQL functions that do not exist in Scala, Python and R around
>>> 175.
>>> For example, we don’t have pyspark.sql.functions.percentile but you can
>>> invoke
>>> it as a SQL function, e.g., SELECT percentile(...).
>>>
>>> The reason why we do not have all functions in the first place is that
>>> we want to
>>> only add commonly used functions, see also
>>> https://github.com/apache/spark/pull/21318 (which I agreed at that time)
>>>
>>> However, this has been raised multiple times over years, from the OSS
>>> community, dev mailing list, JIRAs, stackoverflow, etc.
>>> Seems it’s confusing about which function is available or not.
>>>
>>> Yes, we have a workaround. We can call all expressions by expr("...")
>>>  or call_udf("...", Columns ...)
>>> But still it seems that it’s not very user-friendly because they expect
>>> them available under the functions namespace.
>>>
>>> Therefore, I would like to propose adding all expressions into all
>>> languages so that Spark is simpler and less confusing, e.g., which API is
>>> in functions or not.
>>>
>>> Any thoughts?
>>>
>>>
>>>

Re: [DISCUSS] Add SQL functions into Scala, Python and R API

Reply via email to