Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-04 Thread Liang-Chi Hsieh
Yeah, in short this is a great compromise approach and I do like to see this proposal move forward to next step. This discussion is valuable. Chao Sun wrote > +1 on Dongjoon's proposal. Great to see this is getting moved forward and > thanks everyone for the insightful discussion! > > > >

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-04 Thread Chao Sun
+1 on Dongjoon's proposal. Great to see this is getting moved forward and thanks everyone for the insightful discussion! On Thu, Mar 4, 2021 at 8:58 AM Ryan Blue wrote: > Okay, great. I'll update the SPIP doc and call a vote in the next day or > two. > > On Thu, Mar 4, 2021 at 8:26 AM Erik

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-04 Thread Ryan Blue
Okay, great. I'll update the SPIP doc and call a vote in the next day or two. On Thu, Mar 4, 2021 at 8:26 AM Erik Krogen wrote: > +1 on Dongjoon's proposal. This is a very nice compromise between the > reflective/magic-method approach and the InternalRow approach, providing > a lot of

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-04 Thread Erik Krogen
+1 on Dongjoon's proposal. This is a very nice compromise between the reflective/magic-method approach and the InternalRow approach, providing a lot of flexibility for our users, and allowing for the more complicated reflection-based approach to evolve at its own pace, since you can always fall

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-03 Thread John Zhuge
+1 Good plan to move forward. Thank you all for the constructive and comprehensive discussions in this thread! Decisions on this important feature will have ramifications for years to come. On Wed, Mar 3, 2021 at 7:42 PM Wenchen Fan wrote: > +1 to this proposal. If people don't like the

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-03 Thread Wenchen Fan
+1 to this proposal. If people don't like the ScalarFunction0,1, ... variants and prefer the "magical methods", then we can have a single ScalarFunction interface which has the row-parameter API (with a default implementation to fail) and documents to describe the "magical methods" (which can be

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-03 Thread Ryan Blue
Good point, Dongjoon. I think we can probably come to some compromise here: - Remove SupportsInvoke since it isn’t really needed. We should always try to find the right method to invoke in the codegen path. - Add a default implementation of produceResult so that implementations don’t

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-03 Thread Dongjoon Hyun
Hi, All. We shared many opinions in different perspectives. However, we didn't reach a consensus even on a partial merge by excluding something (on the PR by me, on this mailing thread by Wenchen). For the following claims, we have another alternative to mitigate it. > I don't like it

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-03 Thread Ryan Blue
Yes, GenericInternalRow is safe if when type mismatches, with the cost of using Object[], and primitive types need to do boxing The question is not whether to use the magic functions, which would not need boxing. The question here is whether to use multiple ScalarFunction interfaces. Those

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-02 Thread Wenchen Fan
Yes, GenericInternalRow is safe if when type mismatches, with the cost of using Object[], and primitive types need to do boxing. And this is a runtime failure, which is absolutely worse than query-compile-time checks. Also, don't forget my previous point: users need to specify the type and index

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-01 Thread Ryan Blue
Thanks for adding your perspective, Erik! If the input is string type but the UDF implementation calls row.getLong(0), it returns wrong data I think this is misleading. It is true for UnsafeRow, but there is no reason why InternalRow should return incorrect values. The implementation in

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-23 Thread Wenchen Fan
l already includes an API for aggregate functions >>>>>>>>>>>>> and I think we would want to implement those right away. >>>>>>>>>>>>> >>>>>>>>>>>>> Processing ColumnBatch is something we can easily extend the >>>>>>>>&g

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-23 Thread Dongjoon Hyun
t 9:00 AM Andrew Melo < >>>>>>>>>>>> andrew.m...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hello Ryan, >>>>>>>>>>>>> >>>>>&g

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-22 Thread Wenchen Fan
. It sounds like >>>>>>>>>>>> there is a clear path forward for calling functions. Even without a >>>>>>>>>>>> prototype, the `invoke` plans show that Wenchen's suggested >>>>>>>>>>>> optimization

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-21 Thread Walaa Eldin Moustafa
luding bucket join support for DataSource v2, complete >>>>>>>>>>> support >>>>>>>>>>> for enforcing DataSource v2 distribution requirements on the write >>>>>>>>>>> path, >>>>>>>>>>> etc. I li

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-21 Thread Wenchen Fan
gt;>>>>> >> >>>>>>>>>> >> Chao >>>>>>>>>> >> >>>>>>>>>> >> On Fri, Feb 12, 2021 at 7:58 PM Hyukjin Kwon < >>>>>>>>>> gurwls...@gmail.com&

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-19 Thread Ryan Blue
gt; Basically I think the proposal makes sense to me and I'd like >>>>>>>>> to support the >>>>>>>>> >>>> SPIP as it looks like we have strong need for the important >>>>>>>>> feature. >>>>>>&g

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-18 Thread Wenchen Fan
;> >>>> alternative to me. Besides Wenchen's alternative >>>>>>>> implementation, is there a >>>>>>>> >>>> chance we also have the SupportsInvoke for comparison? >>>>>>>> >>>> >>>>>>&g

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-18 Thread Walaa Eldin Moustafa
>>>>>>> >>>> > it leading to more advanced functionalities like views with >>>>>>> shared custom >>>>>>> >>>> > functions, function pushdown, lambda, etc. It has already >>>>>>> borne fruit from >>>>>&

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-18 Thread Ryan Blue
roposed by Ryan. >>>>>> >>>> > >>>>>> >>>> > >>>>>> >>>> > On Fri, Feb 12, 2021 at 4:35 PM Owen O'Malley >>>>>> >>>> >>>>>> >>>> > owen.omalley@ >&

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-17 Thread Wenchen Fan
y of >>>>> >>>> >> getting to and calling UDFs. >>>>> >>>> >> >>>>> >>>> >> I like having the ScalarFunction as the API to call the UDFs. >>>>> It is >>>>> >>>>

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-17 Thread Ryan Blue
using the functions in other contexts like >>>> pushing down >>>> >>>> >> filters into the ORC & Parquet readers although there are a lot >>>> of >>>> >>>> >> details >>>> >>>> >

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-17 Thread Dongjoon Hyun
gt; >>>> > wrote: >>>>> >>>> > >>>>> >>>> >> I think this proposal is a very good thing giving Spark a >>>>> standard way of >>>>> >>>> >> getting to and calling UDFs. >>>>> &g

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-17 Thread Wenchen Fan
>>>> >> would >>>> >>>> >> also simplify using the functions in other contexts like >>>> pushing down >>>> >>>> >> filters into the ORC & Parquet readers although there are a lot >>>>

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-16 Thread Hyukjin Kwon
>>> >>>> >> On Fri, Feb 12, 2021 at 11:07 PM Erik Krogen >>> >>>> >>> >>>> > ekrogen@.com >>> >>>> >>> >>>> > >>> >>>> >> wrote: >>> >>>>

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-16 Thread Dongjoon Hyun
>> Spark >> >>>> >>> to >> >>>> >>> provide support for shareable UDFs, as well as make movement >> towards >> >>>> >>> more >> >>>> >>> advanced functionality like views which the

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-16 Thread Ryan Blue
heartedly. > >>>> >>> > >>>> >>> I find both of the proposed UDF APIs to be sufficiently > user-friendly > >>>> >>> and > >>>> >>> extensible. I generally think Wenchen's proposal is easier for a > user to > >>>> >>

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-16 Thread Andrew Melo
e proposed UDF APIs to be sufficiently user-friendly >>>> >>> and >>>> >>> extensible. I generally think Wenchen's proposal is easier for a user >>>> >>> to >>>> >>> work with in the common case, but has greater poten

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-16 Thread Walaa Eldin Moustafa
>>> >>> I find both of the proposed UDF APIs to be sufficiently >>>> user-friendly >>>> >>> and >>>> >>> extensible. I generally think Wenchen's proposal is easier for a >>>> user to >>>> >>> work with in the com

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-15 Thread Ye Xianjin
>>>> >>> I find both of the proposed UDF APIs to be sufficiently user-friendly >>>> >>> and >>>> >>> extensible. I generally think Wenchen's proposal is easier for a user >>>> >>> to >>>> >

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-15 Thread Ryan Blue
be sufficiently user-friendly >>> >>> and >>> >>> extensible. I generally think Wenchen's proposal is easier for a >>> user to >>> >>> work with in the common case, but has greater potential for confusing >>> >>> and >>

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-12 Thread Chao Sun
gt; hard-to-debug behavior due to use of reflective method signature >> >>> searches. >> >>> The merits on both sides can hopefully be more properly examined with >> >>> code, >> >>> so I look forward to seeing an implementation of Wenchen'

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-12 Thread Hyukjin Kwon
ncrete comparison. I am optimistic that we will not let the > >>> debate > >>> over this point unreasonably stall the SPIP from making progress. > >>> > >>> Thank you to both Wenchen and Ryan for your detailed consideration and > >>> evaluation

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-12 Thread Liang-Chi Hsieh
> Thank you to both Wenchen and Ryan for your detailed consideration and >>> evaluation of these ideas! >>> -- >>> *From:* Dongjoon Hyun > dongjoon.hyun@ > >>> *Sent:* Wednesday, February 10, 2021 6:06 PM >>> *To

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-12 Thread John Zhuge
stall the SPIP from making progress. >> >> Thank you to both Wenchen and Ryan for your detailed consideration and >> evaluation of these ideas! >> -- >> *From:* Dongjoon Hyun >> *Sent:* Wednesday, February 10, 2021 6:06 PM >> *To:* R

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-12 Thread Owen O'Malley
led consideration and > evaluation of these ideas! > -- > *From:* Dongjoon Hyun > *Sent:* Wednesday, February 10, 2021 6:06 PM > *To:* Ryan Blue > *Cc:* Holden Karau ; Hyukjin Kwon < > gurwls...@gmail.com>; Spark Dev List ; Wenchen Fan < > cloud0...@gmail.com>

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-12 Thread Erik Krogen
for your detailed consideration and evaluation of these ideas! From: Dongjoon Hyun Sent: Wednesday, February 10, 2021 6:06 PM To: Ryan Blue Cc: Holden Karau ; Hyukjin Kwon ; Spark Dev List ; Wenchen Fan Subject: Re: [DISCUSS] SPIP: FunctionCatalog BTW, I forgot to add

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-10 Thread Dongjoon Hyun
BTW, I forgot to add my opinion explicitly in this thread because I was on the PR before this thread. 1. The `FunctionCatalog API` PR was made on May 9, 2019 and has been there for almost two years. 2. I already gave my +1 on that PR last Saturday because I agreed with the latest updated design

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-10 Thread Dongjoon Hyun
Hi, Ryan. We didn't move past anything (both yours and Wenchen's). What Wenchen suggested is double-checking the alternatives with the implementation to give more momentum to our discussion. Your new suggestion about optional extention also sounds like a new reasonable alternative to me. We are

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-10 Thread Ryan Blue
I don’t think we should so quickly move past the drawbacks of this approach. The problems are significant enough that using invoke is not sufficient on its own. But, I think we can add it as an optional extension to shore up the weaknesses. Here’s a summary of the drawbacks: - Magic function

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-10 Thread Dongjoon Hyun
Thank you all for making a giant move forward for Apache Spark 3.2.0. I'm really looking forward to seeing Wenchen's implementation. That would be greatly helpful to make a decision! > I'll implement my idea after the holiday and then we can have more effective discussions. We can also do

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Wenchen Fan
FYI: the Presto UDF API also takes individual parameters instead of the row parameter. I think this direction at least worth a try so that we can see the performance difference. It's also mentioned in the design doc as an alternative

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Wenchen Fan
Hi Holden, As Hyukjin said, following existing designs is not the principle of DS v2 API design. We should make sure the DS v2 API makes sense. AFAIK we didn't fully follow the catalog API design from Hive and I believe Ryan also agrees with it. I think the problem here is we were discussing

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Hyukjin Kwon
Just dropping a few lines. I remember that one of the goals in DSv2 is to correct the mistakes we made in the current Spark codes. It would not have much point if we will happen to just follow and mimic what Spark currently does. It might just end up with another copy of Spark APIs, e.g.

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Holden Karau
I think this proposal is a good set of trade-offs and has existed in the community for a long period of time. I especially appreciate how the design is focused on a minimal useful component, with future optimizations considered from a point of view of making sure it's flexible, but actual concrete

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Ryan Blue
I don’t think that using Invoke really works. The usability is poor if something goes wrong and it can’t handle varargs or parameterized use cases well. There isn’t a significant enough performance penalty for passing data as a row to justify making the API much more difficult and less expressive.

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-09 Thread Wenchen Fan
Hi Ryan, Sorry if I didn't make it clear. I was referring to implementing UDF using codegen, not calling the UDF with codegen or not. Calling UDF is Spark's job and it doesn't matter if the UDF API uses row or individual parameters, as you said. My point is, it's a bad idea to follow the

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-08 Thread Ryan Blue
Wenchen, There are a few issues with the Invoke approach, and I don’t think that it is really much better for the additional complexity of the API. First I think that you’re confusing codegen to call a function with codegen to implement a function. The non-goal refers to supporting codegen to

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-08 Thread Wenchen Fan
This is a very important feature, thanks for working on it! Spark uses codegen by default, and it's a bit unfortunate to see that codegen support is treated as a non-goal. I think it's better to not ask the UDF implementations to provide two different code paths for interpreted evaluation and

[DISCUSS] SPIP: FunctionCatalog

2021-02-08 Thread Ryan Blue
Hi everyone, I'd like to start a discussion for adding a FunctionCatalog interface to catalog plugins. This will allow catalogs to expose functions to Spark, similar to how the TableCatalog interface allows a catalog to expose tables. The proposal doc is available here: