Re: How does PySpark send "import" to the worker when executing Python UDFs?

2022-07-19 Thread Hyukjin Kwon
This is done by cloudpickle. They pickle global variables referred within the func together, and register it to the global imported modules. On Wed, 20 Jul 2022 at 00:55, Li Jin wrote: > Hi, > > I have a question about how does "imports" get send to the python worker. > > For example, I have >

Re: [VOTE] Release Spark 3.2.2 (RC1)

2022-07-11 Thread Hyukjin Kwon
+1 On Tue, 12 Jul 2022 at 06:58, Dongjoon Hyun wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.2.2. > > The vote is open until July 15th 1AM (PST) and passes if a majority +1 PMC > votes are cast, with a minimum of 3 +1 votes. > > [ ] +1 Release this

Re: Apache Spark 3.2.2 Release?

2022-07-06 Thread Hyukjin Kwon
Yeah +1 On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun wrote: > Hi, All. > > Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches > including 11 correctness patches arrived at branch-3.2. > > Shall we make a new release, Apache Spark 3.2.2, as the third release > at 3.2 line? I'd like

Re: Docker images for Spark 3.3.0 release are now available

2022-07-03 Thread Hyukjin Kwon
Thanks Gengliang. On Tue, 28 Jun 2022 at 11:13, Gengliang Wang wrote: > Hi all, > > The official Docker images for Spark 3.3.0 release are now available! > >- To run Spark with Scala/Java API only: >https://hub.docker.com/r/apache/spark >- To run Python on Spark:

Re: [SPARK-39515] Improve scheduled jobs in GitHub Actions

2022-06-23 Thread Hyukjin Kwon
Alright, I'll be there after Holden's talk Thursday https://databricks.com/dataaisummit/session/tools-assisted-apache-spark-version-migrations-21-32 w/ Dongjoon (since he manages OSS Jenkins too). Let's have a quickie chat :-). On Thu, 23 Jun 2022 at 06:16, Hyukjin Kwon wrote: > Oops

Re: [SPARK-39515] Improve scheduled jobs in GitHub Actions

2022-06-22 Thread Hyukjin Kwon
> I have another schedule at South Bay around 7PM and need to leave San > Francisco at least 5PM. > > Dongjoon. > > > On Wed, Jun 22, 2022 at 3:39 AM Hyukjin Kwon wrote: > >> (cc @Yikun Jiang @Gengliang Wang >> @Maxim Gekk >> @Yang,Jie(INF) FYI) >

Re: [SPARK-39515] Improve scheduled jobs in GitHub Actions

2022-06-22 Thread Hyukjin Kwon
(cc @Yikun Jiang @Gengliang Wang @Maxim Gekk @Yang,Jie(INF) FYI) On Wed, 22 Jun 2022 at 19:34, Hyukjin Kwon wrote: > Couple of updates: > >- > >All builds passed now with all combinations we defined in the GitHub >Actions (e.g., branch-3.2, branch-3.3, J

Re: [SPARK-39515] Improve scheduled jobs in GitHub Actions

2022-06-22 Thread Hyukjin Kwon
:49, Hyukjin Kwon wrote: > Just chatted offline - both I and Holden have multiple sessions :-). > Probably let's meet up for a quick chat after your talk > https://databricks.com/dataaisummit/session/what-do-when-your-job-goes-oom-night-flowcharts > ? > > > On Mon, 20 Jun 2022 a

Re: [SPARK-39515] Improve scheduled jobs in GitHub Actions

2022-06-20 Thread Hyukjin Kwon
way meet up at Data AI summit to talk about build CI if > folks are > Interested? > > On Sun, Jun 19, 2022 at 7:50 PM Hyukjin Kwon wrote: > >> Increased the priority to a blocker - I don't think we can release with >> these build failures and poor CI >> >> On Mon,

[PSA] Please rebase and sync your master branch in your forked repository

2022-06-20 Thread Hyukjin Kwon
After https://github.com/apache/spark/pull/36922 gets merged, it requires your fork's master branch to be synced to the latest master branch in Apache Spark. Otherwise, builds would not be triggered in your PR.

Re: [SPARK-39515] Improve scheduled jobs in GitHub Actions

2022-06-19 Thread Hyukjin Kwon
Increased the priority to a blocker - I don't think we can release with these build failures and poor CI On Mon, 20 Jun 2022 at 10:39, Hyukjin Kwon wrote: > There are too many test failures here. I pinged in some PRs I could > identify from a cursory look but would be great for you guys t

Re: [SPARK-39515] Improve scheduled jobs in GitHub Actions

2022-06-19 Thread Hyukjin Kwon
There are too many test failures here. I pinged in some PRs I could identify from a cursory look but would be great for you guys to take a look if you guys haven't tested your change against other environments like JDK 11, Scala 2.13. On Mon, 20 Jun 2022 at 10:04, Hyukjin Kwon wrote: > Hi

[SPARK-39515] Improve scheduled jobs in GitHub Actions

2022-06-19 Thread Hyukjin Kwon
Hi all, I am trying to rework GitHub Actions CI at https://issues.apache.org/jira/browse/SPARK-39515. Any help would be very appreciated.

Re: [VOTE][RESULT] SPIP: Spark Connect

2022-06-16 Thread Hyukjin Kwon
Awesome, I am excited to see this in Apache Spark. On Fri, 17 Jun 2022 at 08:37, Herman van Hovell wrote: > The vote passes with 17 +1s (10 binding +1s). > +1: > Herman van Hovell* > Matei Zaharia* > Yuming Wang > Hyukjin Kwon* > Chao Sun > L.C. Hsieh* > Huaxin Gao &g

Re: Stickers and Swag

2022-06-14 Thread Hyukjin Kwon
Woohoo On Tue, 14 Jun 2022 at 15:04, Xiao Li wrote: > Hi, all, > > The ASF has an official store at RedBubble > that Apache Community > Development (ComDev) runs. If you are interested in buying Spark Swag, 70 > products featuring the Spark logo

Re: [VOTE][SPIP] Spark Connect

2022-06-13 Thread Hyukjin Kwon
+1 On Tue, 14 Jun 2022 at 08:50, Yuming Wang wrote: > +1. > > On Tue, Jun 14, 2022 at 2:20 AM Matei Zaharia > wrote: > >> +1, very excited about this direction. >> >> Matei >> >> On Jun 13, 2022, at 11:07 AM, Herman van Hovell < >> her...@databricks.com.INVALID> wrote: >> >> Let me kick off

Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-08 Thread Hyukjin Kwon
to be made correctly. I lowered the priority to critical. I switch my -1 to 0. On Wed, 8 Jun 2022 at 15:17, Hyukjin Kwon wrote: > Arrrgh .. I am very sorry that I found this problem late. > RC 5 does not have the correct version of PySpark, see > https://github.com/apache/spark/blob/v

Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-08 Thread Hyukjin Kwon
Arrrgh .. I am very sorry that I found this problem late. RC 5 does not have the correct version of PySpark, see https://github.com/apache/spark/blob/v3.3.0-rc5/python/pyspark/version.py#L19 I think the release script was broken because the version now has 'str' type, see

Please stop creating new JIRA version such as 3.4

2022-06-06 Thread Hyukjin Kwon
Hi all, I see some people repeatedly create new versions such as "3.4" (it has to be "3.4.0") in JIRA. [image: Screen Shot 2022-06-07 at 2.29.02 PM.png] I manually check, remove and reassign them but I think it's the fifth time IIRC. Please avoid creating a new version such as 3.4 without

Re: [DISCUSS] SPIP: Spark Connect - A client and server interface for Apache Spark.

2022-06-06 Thread Hyukjin Kwon
What I like most about this SPIP are: 1. We could leverage this SPIP to dispatch the driver to the cluster (e.g., yarn-cluster or K8S cluster mode) with an interactive shell which Spark currently doesn't support. 2. Makes it easier for other languages to support, especially given that we talked

Re: Re: Re: Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-17 Thread Hyukjin Kwon
There might be other blockers. Lets wait and see. On Tue, May 17, 2022 at 8:59 PM beliefer wrote: > OK. let it into 3.3.1 > > > 在 2022-05-17 18:59:13,"Hyukjin Kwon" 写道: > > I think most users won't be affected since aggregate pushdown is disabled > by default. &

Re: Re: Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-17 Thread Hyukjin Kwon
And seems like it won't break it because adding a new method won't break binary compatibility. On Tue, 17 May 2022 at 19:59, Hyukjin Kwon wrote: > I think most users won't be affected since aggregate pushdown is disabled > by default. > > On Tue, 17 May 2022 at 19:53, bel

Re: Re: Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-17 Thread Hyukjin Kwon
I think most users won't be affected since aggregate pushdown is disabled by default. On Tue, 17 May 2022 at 19:53, beliefer wrote: > If we not contains https://github.com/apache/spark/pull/36556, we will > break change when we merge it into 3.3.1 > > At 2022-05-17 18:26:12, &

Re: Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-17 Thread Hyukjin Kwon
tps://github.com/apache/spark/pull/36556 to RC2. > > > 在 2022-05-17 17:37:13,"Hyukjin Kwon" 写道: > > That seems like a test-only issue. I made a quick followup at > https://github.com/apache/spark/pull/36576. > > On Tue, 17 May 2022 at 03:56, Sean Owen wrote: &

Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-17 Thread Hyukjin Kwon
That seems like a test-only issue. I made a quick followup at https://github.com/apache/spark/pull/36576. On Tue, 17 May 2022 at 03:56, Sean Owen wrote: > I'm still seeing failures related to the function registry, like: > > ExpressionsSchemaSuite: > - Check schemas for expression examples ***

Re: Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

2022-05-16 Thread Hyukjin Kwon
think it is a good idea > > > -- 原始邮件 -- > *发件人:* "Hyukjin Kwon" ; > *发送时间:* 2022年5月17日(星期二) 中午11:26 > *收件人:* "dev"; > *抄送:* "Yikun Jiang";"Xinrong Meng"< > xinrong.m...@databricks.com>;"Xiao Li";"Taku

Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

2022-05-16 Thread Hyukjin Kwon
Hi all, What about we introduce a component in JIRA "Pandas API on Spark", and use "PS" (pandas-on-Spark) in PR titles? We already use "ps" in many places when we: import pyspark.pandas as ps. This is similar to "Structured Streaming" in JIRA, and "SS" in PR title. I think it'd be easier to

Re: SIGMOD System Award for Apache Spark

2022-05-12 Thread Hyukjin Kwon
Awesome! On Fri, May 13, 2022 at 5:29 AM Mosharaf Chowdhury wrote: > Wow! Congratulations to everyone indeed. > > On Thu, May 12, 2022 at 3:44 PM Matei Zaharia > wrote: > >> Hi all, >> >> We recently found out that Apache Spark received >> the SIGMOD

Re: Contributor data in github-page no longer updated after May 1

2022-05-11 Thread Hyukjin Kwon
It's very likely a GitHub issue On Wed, 11 May 2022 at 18:01, Yang,Jie(INF) wrote: > Hi, teams > > > > The contributors data in the following page seems no longer updated after > May 1, Can anyone fix it? > > > > >

Re: [VOTE] Release Spark 3.3.0 (RC1)

2022-05-10 Thread Hyukjin Kwon
I expect to see RC2 too. I guess he just sticks to the standard, leaving the vote open till the end. It hasn't got enough +1s anyway :-). On Wed, 11 May 2022 at 10:17, Holden Karau wrote: > Technically release don't follow vetos (see > https://www.apache.org/foundation/voting.html ) it's up to

Re: PR builder not working now

2022-04-19 Thread Hyukjin Kwon
It's fixed now. On Tue, 19 Apr 2022 at 08:33, Hyukjin Kwon wrote: > It's still persistent. I will send an email to GitHub support today > > On Wed, 13 Apr 2022 at 11:04, Dongjoon Hyun > wrote: > >> Thank you for sharing that information! >> >> Bests >>

Re: PR builder not working now

2022-04-18 Thread Hyukjin Kwon
It's still persistent. I will send an email to GitHub support today On Wed, 13 Apr 2022 at 11:04, Dongjoon Hyun wrote: > Thank you for sharing that information! > > Bests > Dongjoon. > > > On Mon, Apr 11, 2022 at 10:29 PM Hyukjin Kwon wrote: > >> Hi all, >>

PR builder not working now

2022-04-11 Thread Hyukjin Kwon
Hi all, There is a bug in GitHub Actions' RESTful API (see https://github.com/HyukjinKwon/spark/actions?query=branch%3Adebug-ga-detection as an example). So, currently OSS PR builder doesn't work properly with showing a screen such as

[DISCUSS] Rename 'SQL' to 'SQL / DataFrame', and 'Query' to 'Execution' in SQL UI page

2022-03-27 Thread Hyukjin Kwon
Hi all, I have been investigating the improvements for Pandas API on Spark specifically in UI. I chatted with a couple of people, and decided to send an email here to discuss more. Currently, both SQL and DataFrame API are shown in “SQL” tab as below: [image: Screen Shot 2022-03-25 at 12.18.14

Re: Conda Python Env in K8S

2021-12-24 Thread Hyukjin Kwon
Can you share the logs, settings, environment, etc. and file a JIRA? There are integration test cases for K8S support, and I myself also tested it before. It would be helpful if you try what I did at https://databricks.com/blog/2020/12/22/how-to-manage-python-dependencies-in-pyspark.html and see

Re: Hadoop profile change to hadoop-2 and hadoop-3 since Spark 3.3

2021-12-11 Thread Hyukjin Kwon
and @tgra...@apache.org too On Sat, 11 Dec 2021 at 21:38, Hyukjin Kwon wrote: > cc @Holden Karau @DB Tsai @Imran > Rashid @Mridul Muralidharan FYI > > On Thu, 9 Dec 2021 at 14:07, angers zhu wrote: > >> Hi all, >> >> Since Spark 3.2, we have supported

Re: Hadoop profile change to hadoop-2 and hadoop-3 since Spark 3.3

2021-12-11 Thread Hyukjin Kwon
cc @Holden Karau @DB Tsai @Imran Rashid @Mridul Muralidharan FYI On Thu, 9 Dec 2021 at 14:07, angers zhu wrote: > Hi all, > > Since Spark 3.2, we have supported Hadoop 3.3.1 now, but its profile name > is *hadoop-3.2* (and *hadoop-2.7*) that is not correct. > So we made a change in

Re: Time for Spark 3.2.1?

2021-12-07 Thread Hyukjin Kwon
SGTM! On Wed, 8 Dec 2021 at 09:07, huaxin gao wrote: > I prefer to start rolling the release in January if there is no need to > publish it sooner :) > > On Tue, Dec 7, 2021 at 3:59 PM Hyukjin Kwon wrote: > >> Oh BTW, I realised that it's a holiday season soon this month i

Re: Time for Spark 3.2.1?

2021-12-07 Thread Hyukjin Kwon
> It's been ~6 months since the last 3.0.x and 3.1.x releases, too; a new >>> release of those wouldn't hurt either, if any of our release managers have >>> the time or inclination. 3.0.x is reaching unofficial end-of-life around >>> now anyway. >>> >>> &

Time for Spark 3.2.1?

2021-12-06 Thread Hyukjin Kwon
Hi all, It's been two months since Spark 3.2.0 release, and we have resolved many bug fixes and regressions. What do you guys think about rolling Spark 3.2.1 release? cc @huaxin gao FYI who I happened to overhear that is interested in rolling the maintenance release :-).

Re: [Apache Spark Jenkins] build system shutting down Dec 23th, 2021

2021-12-06 Thread Hyukjin Kwon
Thanks, Shane. On Tue, 7 Dec 2021 at 09:19, Dongjoon Hyun wrote: > I really want to thank you for all your help. > You've done so many things for the Apache Spark community. > > Sincerely, > Dongjoon > > > On Mon, Dec 6, 2021 at 12:02 PM shane knapp ☠ wrote: > >> hey everyone! >> >> after a

Re: [DISCUSSION] SPIP: Support Volcano/Alternative Schedulers Proposal

2021-11-30 Thread Hyukjin Kwon
Adding @Holden Karau @Dongjoon Hyun @wuyi FYI On Tue, 30 Nov 2021 at 17:46, Yikun Jiang wrote: > Hey everyone, > > I'd like to start a discussion on "Support Volcano/Alternative Schedulers > Proposal". > > This SPIP is proposed to make spark k8s schedulers provide more YARN like > features

Re: Jira components cleanup

2021-11-28 Thread Hyukjin Kwon
Thanks Nicholas for raising this, and Sean for updating it! On Tue, 16 Nov 2021 at 03:27, Sean Owen wrote: > Done. Now let's see if that generated 86 update emails! > > On Mon, Nov 15, 2021 at 11:03 AM Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> >>

Re: Supports Dynamic Table Options for Spark SQL

2021-11-15 Thread Hyukjin Kwon
My biggest concern with the syntax in hints is that Spark SQL's options can change results (e.g., CSV's header options) whereas hints are generally not designed to affect the external results if I am not mistaken. This is counterintuitive. I left the comment in the PR but what's the real benefit

Re: [FYI] Build and run tests on Java 17 for Apache Spark 3.3

2021-11-12 Thread Hyukjin Kwon
Awesome! On Sat, Nov 13, 2021 at 12:04 PM Xiao Li wrote: > Thank you! Great job! > > Xiao > > > On Fri, Nov 12, 2021 at 7:02 PM Mridul Muralidharan > wrote: > >> >> Nice job ! >> There are some nice API's which should be interesting to explore with JDK >> 17 :-) >> >> Regards. >> Mridul >> >>

Re: DataFrame.mapInArrow

2021-11-10 Thread Hyukjin Kwon
Sure, thanks Holden :-). On Thu, 11 Nov 2021 at 15:53, Holden Karau wrote: > Sorry I've been busy, I'll try and take a look tomorrow, excited to see > this progress though :) > > On Wed, Nov 10, 2021 at 9:01 PM Hyukjin Kwon wrote: > >> Last reminder: I plan to merge th

Re: DataFrame.mapInArrow

2021-11-10 Thread Hyukjin Kwon
Last reminder: I plan to merge this in a few more days. Any feedback and review would be very appreciated. On Tue, 9 Nov 2021 at 21:51, Hyukjin Kwon wrote: > Hi dev, > > I proposed DataFrame.mapInArrow ( > https://github.com/apache/spark/pull/34505) which allows users to > di

DataFrame.mapInArrow

2021-11-09 Thread Hyukjin Kwon
Hi dev, I proposed DataFrame.mapInArrow (https://github.com/apache/spark/pull/34505) which allows users to directly leverage Arrow batch to plug in other external systems easily. I would like to make sure this design of API covers most use cases, and would like to know if there is other feedback

Update Spark 3.3 release window?

2021-10-27 Thread Hyukjin Kwon
Hi all, Spark 3.2. is out. Shall we update the release window https://spark.apache.org/versioning-policy.html? I am thinking of Mid March 2022 (5 months after the 3.2 release) for code freeze and onward.

Re: [DISCUSS] SPIP: Storage Partitioned Join for Data Source V2

2021-10-26 Thread Hyukjin Kwon
Seems making sense to me. Would be great to have some feedback from people such as @Wenchen Fan @Cheng Su @angers zhu . On Tue, 26 Oct 2021 at 17:25, Dongjoon Hyun wrote: > +1 for this SPIP. > > On Sun, Oct 24, 2021 at 9:59 AM huaxin gao wrote: > >> +1. Thanks for lifting the current

Re: Adding Spark 4 to JIRA for targetted versions

2021-09-13 Thread Hyukjin Kwon
BTW, I vaguely remember that adding a new version affects the default version for the merging script to use for JIRA resolution. e.g., now it's 3.3.0 but it becomes 4.0.0 ... Maybe it's nicer to double check how it's affected. 2021년 9월 14일 (화) 오후 1:32, Dongjoon Hyun 님이 작성: > I'm fine to have the

Re: CRAN package SparkR

2021-09-01 Thread Hyukjin Kwon
ght be enough. This checks for > interactive() > > > https://github.com/apache/spark/blob/c6a2021fec5bab9069fbfba33f75d4415ea76e99/R/pkg/R/sparkR.R#L658 > > > On Tue, Aug 31, 2021 at 5:55 PM Hyukjin Kwon wrote: > >> Oh I missed this. Yes, can we simply get the user' conf

Re: CRAN package SparkR

2021-08-31 Thread Hyukjin Kwon
Oh I missed this. Yes, can we simply get the user' confirmation when we install.spark? IIRC, the auto installation is only triggered by interactive shell so getting user's confirmation should be fine. 2021년 6월 18일 (금) 오전 2:54, Felix Cheung 님이 작성: > Any suggestion or comment on this? They are

Re: -1s on committed but not released code?

2021-08-19 Thread Hyukjin Kwon
Yeah, I think we can discuss and revert it (or fix it) per the veto set. Often problems are found later after codes are merged. 2021년 8월 20일 (금) 오전 4:08, Mridul Muralidharan 님이 작성: > Hi Holden, > > In the past, I have seen discussions on the merged pr to thrash out the > details. > Usually it

Re: Time to start publishing Spark Docker Images?

2021-08-12 Thread Hyukjin Kwon
+1, I think we generally agreed upon having it. Thanks Holden for headsup and driving this. +@Dongjoon Hyun FYI 2021년 7월 22일 (목) 오후 12:22, Kent Yao 님이 작성: > +1 > > Bests, > > *Kent Yao * > @ Data Science Center, Hangzhou Research Institute, NetEase Corp. > *a spark enthusiast* > *kyuubi

Re: ASF board report draft for August

2021-08-09 Thread Hyukjin Kwon
> Are you referring to what version of Koala project? 1.8.1? Yes, the latest version 1.8.1. 2021년 8월 10일 (화) 오전 11:07, Igor Costa 님이 작성: > Hi Matei, nice update > > > Just one question, when you mention “ We are working on Spark 3.2.0 as > our next release, with a release candidate likely to

Re: ASF board report draft for August

2021-08-09 Thread Hyukjin Kwon
There is an SPIP passed and ready for Spark 3.2: pandas API on Spark: - JIRA: SPIP: Support pandas API layer on PySpark ( https://issues.apache.org/jira/browse/SPARK-34849) - Vote: [VOTE] SPIP: Support pandas API layer on PySpark ( https://www.mail-archive.com/dev@spark.apache.org/msg27605.html)

Re: Flaky build in GitHub Actions

2021-07-25 Thread Hyukjin Kwon
Spark repo. 2021년 7월 22일 (목) 오전 9:40, Hyukjin Kwon 님이 작성: > FYI, @Liang-Chi Hsieh is trying to control the memory > in the test base at https://github.com/apache/spark/pull/33447 which > looks almost promising now. > While I don't object to merge things, would need to close

Re: Flaky build in GitHub Actions

2021-07-21 Thread Hyukjin Kwon
I'm assuming if things pass Jenkins we are OK with merging yes? > > On Wed, Jul 21, 2021 at 10:03 AM Dongjoon Hyun > wrote: > >> Thank you, Hyukjin! >> >> Dongjoon. >> >> On Tue, Jul 20, 2021 at 8:53 PM Hyukjin Kwon wrote: >> >>> I filed a ticke

Re: Flaky build in GitHub Actions

2021-07-20 Thread Hyukjin Kwon
I filed a ticket at GitHub. I will share more details when I get a response from them. 2021년 7월 20일 (화) 오후 7:30, Hyukjin Kwon 님이 작성: > Hi all, > > Looks like there's something going on in the machines in GitHub Actions. > The build is now very flaky and keeps dying with symptoms

Flaky build in GitHub Actions

2021-07-20 Thread Hyukjin Kwon
Hi all, Looks like there's something going on in the machines in GitHub Actions. The build is now very flaky and keeps dying with symptoms like I guess out-of-memory (?). I will try to take a closer look tomorrow but it would be great if you guys find some time to take a look into it 

Re: [VOTE] Release Spark 3.0.3 (RC1)

2021-06-20 Thread Hyukjin Kwon
+1 2021년 6월 21일 (월) 오후 2:19, Dongjoon Hyun 님이 작성: > +1 > > Thank you, Yi. > > Bests, > Dongjoon. > > > On Sat, Jun 19, 2021 at 6:57 PM Yuming Wang wrote: > >> +1 >> >> Tested a batch of production query with Thrift Server. >> >> On Sat, Jun 19, 2021 at 3:04 PM Mridul Muralidharan >> wrote: >>

Re: Apache Spark 3.2 Expectation

2021-06-17 Thread Hyukjin Kwon
*GA -> QA On Thu, 17 Jun 2021, 15:16 Hyukjin Kwon, wrote: > I think we would make sure treating these items in the list as exceptions > from the code freeze, and discourage to push new APIs and features though. > > GA period ideally we should focus on bug fixes and polishin

Re: Apache Spark 3.2 Expectation

2021-06-17 Thread Hyukjin Kwon
're working on them and expect to >>>>> have >>>>> >> them >>>>> >> in the new release. >>>>> >> >>>>> >> So I propose to postpone the branch cut date. >>>>> >>

Re: Apache Spark 3.2 Expectation

2021-06-15 Thread Hyukjin Kwon
d be September 2. >> >> I'm updating the release dates in >> https://github.com/apache/spark-website/pull/331 >> >> Thanks, >> Wenchen >> >> On Thu, Mar 11, 2021 at 11:17 PM Dongjoon Hyun >> wrote: >> >>> Thank you, Xiao, Wenchen a

Re: Apache Spark 3.0.3 Release?

2021-06-08 Thread Hyukjin Kwon
Yeah, +1 2021년 6월 9일 (수) 오후 12:06, Yi Wu 님이 작성: > Hi, All. > > Since Apache Spark 3.0.2 tag creation (Feb 16), > new 119 patches (92 issues > > resolved) arrived at branch-3.0. > > Shall we make a new release, Apache Spark 3.0.3,

Re: [ANNOUNCE] Apache Spark 3.1.2 released

2021-06-01 Thread Hyukjin Kwon
awesome! 2021년 6월 2일 (수) 오전 9:59, Dongjoon Hyun 님이 작성: > We are happy to announce the availability of Spark 3.1.2! > > Spark 3.1.2 is a maintenance release containing stability fixes. This > release is based on the branch-3.1 maintenance branch of Spark. We strongly > recommend all 3.1 users to

Re: [VOTE] Release Spark 3.1.2 (RC1)

2021-05-25 Thread Hyukjin Kwon
+1 2021년 5월 26일 (수) 오전 9:00, Cheng Su 님이 작성: > +1 (non-binding) > > > > Checked the related commits in commit history manually. > > > > Thanks! > > Cheng Su > > > > *From: *Takeshi Yamamuro > *Date: *Tuesday, May 25, 2021 at 4:47 PM > *To: *Dongjoon Hyun , dev > *Subject: *Re: [VOTE] Release

Re: Resolves too old JIRAs as incomplete

2021-05-19 Thread Hyukjin Kwon
Yeah, I wanted to discuss this. I agree since 2.4.x became EOL 2021년 5월 20일 (목) 오전 10:54, Sean Owen 님이 작성: > I agree. Such old JIRAs are 99% obsolete. If anyone objects to a > particular issue being closed, they can comment and we can reopen. It's a > very reversible thing. There is value in

Re: [ANNOUNCE] Apache Spark 2.4.8 released

2021-05-17 Thread Hyukjin Kwon
Yay! 2021년 5월 18일 (화) 오후 12:57, Liang-Chi Hsieh 님이 작성: > We are happy to announce the availability of Spark 2.4.8! > > Spark 2.4.8 is a maintenance release containing stability, correctness, and > security fixes. > This release is based on the branch-2.4 maintenance branch of Spark. We >

Re: Apache Spark 3.1.2 Release?

2021-05-17 Thread Hyukjin Kwon
+1 thanks for driving me On Tue, 18 May 2021, 09:33 Holden Karau, wrote: > +1 and thanks for volunteering to be the RM :) > > On Mon, May 17, 2021 at 4:09 PM Takeshi Yamamuro > wrote: > >> Thank you, Dongjoon~ sgtm, too. >> >> On Tue, May 18, 2021 at 7:34 AM Cheng Su wrote: >> >>> +1 for a

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-10 Thread Hyukjin Kwon
+1 2021년 5월 10일 (월) 오후 4:45, John Zhuge 님이 작성: > No, just try to build a Java project with Maven RC repo. > > Validated checksum and signature; ran RAT checks; built the source and ran > unit tests. > > +1 (non-binding) > > On Sun, May 9, 2021 at 11:10 PM Liang-Chi Hsieh wrote: > >> Yea, I

Re: [VOTE] Release Spark 2.4.8 (RC3)

2021-04-28 Thread Hyukjin Kwon
+1 On Thu, 29 Apr 2021, 07:08 Sean Owen, wrote: > +1 from me too, same result as last time. > > On Wed, Apr 28, 2021 at 11:33 AM Liang-Chi Hsieh wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 2.4.8. >> >> The vote is open until May 4th at 9AM PST and

Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Hyukjin Kwon
nn/spark-func-extras>A** library t**hat > brings useful functions from various modern database management systems to > **Apache > Spark <http://spark.apache.org/>.* > > > > On 04/15/2021 12:09,Hyukjin Kwon > wrote: > > The issue is fixed now. Please keep monito

Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Hyukjin Kwon
The issue is fixed now. Please keep monitoring this. Thank you all! The spark community is super active and cooperative! 2021년 4월 15일 (목) 오전 11:01, Hyukjin Kwon 님이 작성: > The fix will be straightforward. We can either, in Github Actions > workflow,: > - remove fast forward option and

Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Hyukjin Kwon
The fix will be straightforward. We can either, in Github Actions workflow,: - remove fast forward option and see if ti works - or git rebase before merge the branch 2021년 4월 15일 (목) 오전 11:00, Hyukjin Kwon 님이 작성: > I think it works mostly correctly as Dongjoon investigated and shared > (

Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Hyukjin Kwon
ilapiros/spark/runs/2344911058?check_suite_focus=true >> (some other failures noticed) >> >> >> Bests, >> >> Kent >> >> Dongjoon Hyun 于2021年4月14日周三 下午11:34写道: >> > >> > Thank you again, Hyukjin. >> > >> &g

Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-14 Thread Hyukjin Kwon
t;>> On Wed, Apr 14, 2021 at 1:00 PM Gengliang Wang wrote: >>> >>>> Thanks for the amazing work, Hyukjin! >>>> I created a PR for trial and it looks well so far: >>>> https://github.com/apache/spark/pull/32158 >>>> >>>&g

Re: [PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-13 Thread Hyukjin Kwon
sitory). Please check the build notified by github-actions bot before merging it. There would be a followup work to reflect the status of the forked repository's build to the status of PR. 2021년 4월 14일 (수) 오후 1:42, Hyukjin Kwon 님이 작성: > Hi all, > > After https://github.com/apache/spark/pu

[PSA] Please read: PR builder now runs test and build in your forked repository

2021-04-13 Thread Hyukjin Kwon
Hi all, After https://github.com/apache/spark/pull/32092 merged, now we run the GitHub Actions workflows in your forked repository. In short, please see this example HyukjinKwon#34 1. You create a PR and your repository triggers the workflow.

Re: [DISCUSS] Build error message guideline

2021-04-13 Thread Hyukjin Kwon
are these > guidelines with the wider community. A good landing page for contributors > could be https://spark.apache.org/contributing.html. What do you think? > > Thank you, > > Karen Feng > > On Wed, Apr 7, 2021 at 8:19 PM Hyukjin Kwon wrote: > >> LGTM (I took a look

Re: [VOTE] Release Spark 2.4.8 (RC2)

2021-04-12 Thread Hyukjin Kwon
+1 On Tue, 13 Apr 2021, 02:58 Sean Owen, wrote: > +1 same result as last RC for me. > > On Mon, Apr 12, 2021, 12:53 AM Liang-Chi Hsieh wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 2.4.8. >> >> The vote is open until Apr 15th at 9AM PST and passes if a

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-07 Thread Hyukjin Kwon
business: we allow > individual projects to sign deals with Github to get dedicated resources. > It's a bit wasteful to ask every project to set up its own dev ops, > using Github Action is more convenient. Maybe we should raise it to Github? > > On Wed, Apr 7, 2021 at 9:31 PM Hyukjin Kwon w

Re: [DISCUSS] Build error message guideline

2021-04-07 Thread Hyukjin Kwon
LGTM (I took a look, and had some offline discussions w/ some corrections before it came out) 2021년 4월 8일 (목) 오전 5:28, Karen 님이 작성: > Hi all, > > As discussed in SPIP: Standardize Exception Messages in Spark ( >

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-07 Thread Hyukjin Kwon
anisation shares all resources across the projects. 2021년 4월 7일 (수) 오후 10:04, Martin Grigorov 님이 작성: > > > On Wed, Apr 7, 2021 at 3:41 PM Hyukjin Kwon wrote: > >> Hi Greg, >> >> I raised this thread to figure out a way that we can work together to >> resolve this

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-07 Thread Hyukjin Kwon
suffer from the lack of resources. I appreciate the resources provided to us but that does not resolve the issue of the development being slowed down. 2021년 4월 7일 (수) 오후 5:52, Greg Stein 님이 작성: > On Wed, Apr 7, 2021 at 12:25 AM Hyukjin Kwon wrote: > >> Hi all, >> >>

Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-06 Thread Hyukjin Kwon
Hi all, I am an Apache Spark PMC, and would like to know the future plan about GitHub Actions in ASF. Please also see the INFRA ticket I filed: https://issues.apache.org/jira/browse/INFRA-21646. I am aware of the limited GitHub Actions resources that are shared across all projects in ASF, and

Re: Support User Defined Types in pandas_udf for Spark's own Python API

2021-04-06 Thread Hyukjin Kwon
park is still not Pythonic enough. For example, I hear > complaints such as "why does > > PySpark follow pascalCase?" or "PySpark APIs are difficult to learn", > and APIs are very difficult to change > > in Spark (as I emphasized above). > > &

Re: Apache Spark 2.4.8 (and EOL of 2.4)

2021-04-04 Thread Hyukjin Kwon
I would +1for just going ahead. That looks flaky to me too. Thanks Langchi for driving this! On Sun, 4 Apr 2021, 18:17 Liang-Chi Hsieh, wrote: > Hi devs, > > Currently no open issues or ongoing issues targeting 2.4. > > On QA test dashboard, only spark-branch-2.4-test-sbt-hadoop-2.6 is in red

Re: [VOTE] SPIP: Support pandas API layer on PySpark

2021-03-29 Thread Hyukjin Kwon
The vote passed with the following 20 +1 votes and no -1 or +0 votes: Hyukjin Kwon* Dongjoon Hyun* Maciej Szymkiewicz Bryan Cutler Reynold Xin* Liang-Chi Hsieh Takeshi Yamamuro Xiao Li* Mridul Muralidharan* Gengliang Wang Matei Zaharia* Maxim Gekk 郑瑞峰 (Ruifeng Zheng) Denny Lee Kousuke Saruta

Re: Welcoming six new Apache Spark committers

2021-03-26 Thread Hyukjin Kwon
Congrats guys. Well deserved! On Sat, 27 Mar 2021, 05:28 Matei Zaharia, wrote: > Hi all, > > The Spark PMC recently voted to add several new committers. Please join me > in welcoming them to their new role! Our new committers are: > > - Maciej Szymkiewicz (contributor to PySpark) > - Max Gekk

Re: [VOTE] SPIP: Support pandas API layer on PySpark

2021-03-26 Thread Hyukjin Kwon
I'll start with my +1 (binding) On Fri, 26 Mar 2021, 23:52 Hyukjin Kwon, wrote: > Hi all, > > I’d like to start a vote for SPIP: Support pandas API layer on PySpark. > > The proposal is to embrace Koalas in PySpark to have the pandas API layer > on PySpark. >

[VOTE] SPIP: Support pandas API layer on PySpark

2021-03-26 Thread Hyukjin Kwon
Hi all, I’d like to start a vote for SPIP: Support pandas API layer on PySpark. The proposal is to embrace Koalas in PySpark to have the pandas API layer on PySpark. Please also refer to: - Previous discussion in dev mailing list: [DISCUSS] Support pandas API layer on PySpark

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-17 Thread Hyukjin Kwon
Thanks Nicholas for the pointer :-). On Thu, 18 Mar 2021, 00:11 Nicholas Chammas, wrote: > On Tue, Mar 16, 2021 at 9:15 PM Hyukjin Kwon wrote: > >> I am currently thinking we will have to convert the Koalas tests to use >> unittests to match with PySpark for now.

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-17 Thread Hyukjin Kwon
gt; > Especially when some people here are thinking about making it the > default/replacing the regular API I would strongly suggest defaulting to an > indexing mechanism that is not changing the query plan. > > Best, > Georg > > Am Mi., 17. März 2021 um 12:13 Uhr schrieb Hyukj

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-17 Thread Hyukjin Kwon
>> would provide the hooks via Pandas' ExtensionArray interface to allow >> Spark to performantly interchange jagged/ragged lists to/from python >> UDFs. >> >> Cheers >> Andrew >> >> On Tue, Mar 16, 2021 at 8:15 PM Hyukjin Kwon wrote: >> >

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-16 Thread Hyukjin Kwon
Thank you guys for all your feedback. I will start working on SPIP with Koalas team. I would expect the SPIP can be sent late this week or early next week. I inlined and answered the questions unanswered as below: Is the community developing the pandas API layer for Spark interested in being

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-14 Thread Hyukjin Kwon
> better in Spark but it's well maintained. It adds some overhead to > maintaining Spark conversely. On the upside it makes it a little more > discoverable. Are there more 'synergies'? > > On Sat, Mar 13, 2021, 7:57 PM Hyukjin Kwon wrote: > >> Hi all, >> >>

[DISCUSS] Support pandas API layer on PySpark

2021-03-13 Thread Hyukjin Kwon
Hi all, I would like to start the discussion on supporting pandas API layer on Spark. If we have a general consensus on having it in PySpark, I will initiate and drive an SPIP with a detailed explanation about the implementation’s overview and structure. I would appreciate it if I can know

Re: [VOTE] SPIP: Add FunctionCatalog

2021-03-11 Thread Hyukjin Kwon
+1 2021년 3월 12일 (금) 오후 2:54, Jungtaek Lim 님이 작성: > +1 (non-binding) Excellent description on SPIP doc! Thanks for the amazing > effort! > > On Wed, Mar 10, 2021 at 3:19 AM Liang-Chi Hsieh wrote: > >> >> +1 (non-binding). >> >> Thanks for the work! >> >> >> Erik Krogen wrote >> > +1 from me

<    1   2   3   4   5   6   7   >