Re: SPARK-34600. Support user-defined types in Pandas UDF

2021-03-03 Thread attilapiros
Hi! First of all thanks for your contribution! PySpark is not an area I am familiar with but I can answer your question regarding Jira. The issue will be assigned to you when your change is in: > The JIRA will be Assigned to the primary contributor to the change as a > way of giving credit. If

SPARK-34600. Support user-defined types in Pandas UDF

2021-03-03 Thread Lei Xu
Hi, Here I have been working on a PR (https://github.com/apache/spark/pull/31735) that allows returning UserDefinedType from PandasUDF. Would love to see the feedback from the community. Btw, since this is my first patch on Spark, it seems that I dont have permission to assign the ticket

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-03 Thread John Zhuge
+1 Good plan to move forward. Thank you all for the constructive and comprehensive discussions in this thread! Decisions on this important feature will have ramifications for years to come. On Wed, Mar 3, 2021 at 7:42 PM Wenchen Fan wrote: > +1 to this proposal. If people don't like the

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-03 Thread Wenchen Fan
+1 to this proposal. If people don't like the ScalarFunction0,1, ... variants and prefer the "magical methods", then we can have a single ScalarFunction interface which has the row-parameter API (with a default implementation to fail) and documents to describe the "magical methods" (which can be

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-03 Thread Ryan Blue
Good point, Dongjoon. I think we can probably come to some compromise here: - Remove SupportsInvoke since it isn’t really needed. We should always try to find the right method to invoke in the codegen path. - Add a default implementation of produceResult so that implementations don’t

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-03 Thread Dongjoon Hyun
Hi, All. We shared many opinions in different perspectives. However, we didn't reach a consensus even on a partial merge by excluding something (on the PR by me, on this mailing thread by Wenchen). For the following claims, we have another alternative to mitigate it. > I don't like it

Re: [ANNOUNCE] Announcing Apache Spark 3.1.1

2021-03-03 Thread Hyukjin Kwon
Thank you so much guys .. it indeed took a long time and it was pretty tough this time :-). It was all possible because of your guys' support. I sincerely appreciate it . 2021년 3월 4일 (목) 오전 2:26, Dongjoon Hyun 님이 작성: > It took a long time. Thank you, Hyukjin and all! > > Bests, > Dongjoon. > >

Re: Apache Spark 2.4.8 (and EOL of 2.4)

2021-03-03 Thread Takeshi Yamamuro
+1 for releasing 2.4.8 and thanks, Liang-chi, for volunteering. Btw, anyone roughly know how many v2.4 users still are based on some stats (e.g., # of v2.4.7 downloads from the official repos)? Most users have started using v3.x? On Thu, Mar 4, 2021 at 8:34 AM Hyukjin Kwon wrote: > Yeah, I

Re: Apache Spark 2.4.8 (and EOL of 2.4)

2021-03-03 Thread Hyukjin Kwon
Yeah, I would prefer to have a 2.4.8 release as an EOL too. I don't mind having 2.4.9 as EOL too if that's preferred from more people. 2021년 3월 4일 (목) 오전 4:01, Sean Owen 님이 작성: > Sure, I'm even arguing that 2.4.8 could possibly be the final release. No > objection of course to continuing to

Re: Apache Spark Docker image repository

2021-03-03 Thread Ismaël Mejía
Since Spark 3.1.1 is out now I was wondering if it would make sense to try to get some consensus about starting to release docker images as part of Spark 3.2. Having ready to use images would definitely benefit adoption in particular now that we support containerized runs via k8s became GA. WDYT?

Re: Apache Spark 3.2 Expectation

2021-03-03 Thread Dongjoon Hyun
Hi, John. This thread aims to share your expectations and goals (and maybe work progress) to Apache Spark 3.2 because we are making this together. :) Bests, Dongjoon. On Wed, Mar 3, 2021 at 1:59 PM John Zhuge wrote: > Hi Dongjoon, > > Is it possible to get ViewCatalog in? The community

Re: Apache Spark 3.2 Expectation

2021-03-03 Thread John Zhuge
Hi Dongjoon, Is it possible to get ViewCatalog in? The community already had fairly detailed discussions. Thanks, John On Thu, Feb 25, 2021 at 8:57 AM Dongjoon Hyun wrote: > Hi, All. > > Since we have been preparing Apache Spark 3.2.0 in master branch since > December 2020, March seems to be

Re: Apache Spark 2.4.8 (and EOL of 2.4)

2021-03-03 Thread Sean Owen
Sure, I'm even arguing that 2.4.8 could possibly be the final release. No objection of course to continuing to backport to 2.4.x where appropriate and cutting 2.4.9 later in the year as a final EOL release, either. On Wed, Mar 3, 2021 at 12:59 PM Dongjoon Hyun wrote: > Thank you, Sean. > > Ya,

Re: Apache Spark 2.4.8 (and EOL of 2.4)

2021-03-03 Thread Dongjoon Hyun
Thank you, Sean. Ya, exactly, we can release 2.4.8 as a normal release first and use 2.4.9 as the EOL release. Since 2.4.7 was released almost 6 months ago, 2.4.8 is a little late in terms of the cadence. Bests, Dongjoon. On Wed, Mar 3, 2021 at 10:55 AM Sean Owen wrote: > For reference,

Re: Apache Spark 2.4.8 (and EOL of 2.4)

2021-03-03 Thread Sean Owen
For reference, 2.3.x was maintained from February 2018 (2.3.0) to Sep 2019 (2.3.4), or about 19 months. The 2.4 branch should probably be maintained longer than that, as the final 2.x branch. 2.4.0 was released in Nov 2018. A final release in, say, April 2021 would be about 30 months. That feels

Re: Apache Spark 2.4.8 (and EOL of 2.4)

2021-03-03 Thread Dongjoon Hyun
Thank you for volunteering as Apache Spark 2.4.8 release manager, Liang-chi! On Wed, Mar 3, 2021 at 10:13 AM Liang-Chi Hsieh wrote: > > Thanks Dongjoon! > > +1 and I volunteer to do the release of 2.4.8 if it passes. > > > Liang-Chi > > > > > -- > Sent from:

Re: [DISCUSS] SPIP: FunctionCatalog

2021-03-03 Thread Ryan Blue
Yes, GenericInternalRow is safe if when type mismatches, with the cost of using Object[], and primitive types need to do boxing The question is not whether to use the magic functions, which would not need boxing. The question here is whether to use multiple ScalarFunction interfaces. Those

Re: Apache Spark 2.4.8 (and EOL of 2.4)

2021-03-03 Thread Liang-Chi Hsieh
Thanks Dongjoon! +1 and I volunteer to do the release of 2.4.8 if it passes. Liang-Chi -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Apache Spark 2.4.8 (and EOL of 2.4)

2021-03-03 Thread Dongjoon Hyun
Hi, All. We successfully completed Apache Spark 3.1.1 and 3.0.2 releases and started 3.2.0 discussion already. Let's talk about branch-2.4 because there exists some discussions on JIRA and GitHub about skipping backporting to 2.4. Since `branch-2.4` has been maintained well as LTS, I'd like to

Re: minikube and kubernetes cluster versions for integration testing

2021-03-03 Thread shane knapp ☠
please open a jira for this and assign it to me... shouldn't be too big of a deal to get this set up. On Tue, Mar 2, 2021 at 6:06 PM Dongjoon Hyun wrote: > Thank you for sharing and suggestion, Attila. > > Additionally, given the following information, > > - The latest Minikube is v1.18.0 with

Re: [ANNOUNCE] Announcing Apache Spark 3.1.1

2021-03-03 Thread Dongjoon Hyun
It took a long time. Thank you, Hyukjin and all! Bests, Dongjoon. On Wed, Mar 3, 2021 at 3:23 AM Gabor Somogyi wrote: > Good to hear and great work Hyukjin!  > > On Wed, 3 Mar 2021, 11:15 Jungtaek Lim, > wrote: > >> Thanks Hyukjin for driving the huge release, and thanks everyone for >>

Re: Apache Spark 3.2 Expectation

2021-03-03 Thread Chang Chen
+1 for Data Source V2 Aggregate push down huaxin gao 于2021年2月27日周六 上午4:20写道: > Thanks Dongjoon and Xiao for the discussion. I would like to add Data > Source V2 Aggregate push down to the list. I am currently working on > JDBC Data Source V2 Aggregate push down, but the common code can be used

Re: [ANNOUNCE] Announcing Apache Spark 3.1.1

2021-03-03 Thread Gabor Somogyi
Good to hear and great work Hyukjin!  On Wed, 3 Mar 2021, 11:15 Jungtaek Lim, wrote: > Thanks Hyukjin for driving the huge release, and thanks everyone for > contributing the release! > > On Wed, Mar 3, 2021 at 6:54 PM angers zhu wrote: > >> Great work, Hyukjin ! >> >> Bests, >> Angers >> >>

Re: [ANNOUNCE] Announcing Apache Spark 3.1.1

2021-03-03 Thread Jungtaek Lim
Thanks Hyukjin for driving the huge release, and thanks everyone for contributing the release! On Wed, Mar 3, 2021 at 6:54 PM angers zhu wrote: > Great work, Hyukjin ! > > Bests, > Angers > > Wenchen Fan 于2021年3月3日周三 下午5:02写道: > >> Great work and congrats! >> >> On Wed, Mar 3, 2021 at 3:51 PM

Re: [ANNOUNCE] Announcing Apache Spark 3.1.1

2021-03-03 Thread angers zhu
Great work, Hyukjin ! Bests, Angers Wenchen Fan 于2021年3月3日周三 下午5:02写道: > Great work and congrats! > > On Wed, Mar 3, 2021 at 3:51 PM Kent Yao wrote: > >> Congrats, all! >> >> Bests, >> *Kent Yao * >> @ Data Science Center, Hangzhou Research Institute, NetEase Corp. >> *a spark enthusiast* >>

Re: [ANNOUNCE] Announcing Apache Spark 3.1.1

2021-03-03 Thread Wenchen Fan
Great work and congrats! On Wed, Mar 3, 2021 at 3:51 PM Kent Yao wrote: > Congrats, all! > > Bests, > *Kent Yao * > @ Data Science Center, Hangzhou Research Institute, NetEase Corp. > *a spark enthusiast* > *kyuubi is a unified multi-tenant JDBC > interface