Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Hyukjin Kwon
SGTM On Thu, 2 May 2024 at 02:06, Dongjoon Hyun wrote: > +1 for next Monday. > > Dongjoon. > > On Wed, May 1, 2024 at 8:46 AM Tathagata Das > wrote: > >> Next week sounds great! Thank you Wenchen! >> >> On Wed, May 1, 2024 at 11:16 AM Wenchen Fan wrote: >> >>> Yea I think a preview release

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-29 Thread Hyukjin Kwon
Mich, It is a legacy config we should get rid of in the end, and it has been tested in production for very long time. Spark should create a Spark table by default. On Tue, Apr 30, 2024 at 5:38 AM Mich Talebzadeh wrote: > Your point > > ".. t's a surprise to me to see that someone has different

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-29 Thread Hyukjin Kwon
+1 It's a legacy conf that we should eventually remove it away. Spark should create Spark table by default, not Hive table. Mich, for your workload, you can simply switch that conf off if it concerns you. We also enabled ANSI as well (that you agreed on). It's a bit akwakrd to stop in the middle

Re: [VOTE] Release Spark 3.4.3 (RC2)

2024-04-16 Thread Hyukjin Kwon
+1 On Wed, Apr 17, 2024 at 3:57 AM L. C. Hsieh wrote: > +1 > > On Tue, Apr 16, 2024 at 4:08 AM Wenchen Fan wrote: > > > > +1 > > > > On Mon, Apr 15, 2024 at 12:31 PM Dongjoon Hyun > wrote: > >> > >> I'll start with my +1. > >> > >> - Checked checksum and signature > >> - Checked

Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-13 Thread Hyukjin Kwon
+1 On Sun, Apr 14, 2024 at 7:46 AM Chao Sun wrote: > +1. > > This feature is very helpful for guarding against correctness issues, such > as null results due to invalid input or math overflows. It’s been there for > a while now and it’s a good time to enable it by default as Spark enters > the

[VOTE][RESULT] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-03 Thread Hyukjin Kwon
The vote passes with 19+1s (13 binding +1s). (* = binding) +1: Haejoon Lee Ruifeng Zheng(*) Dongjoon Hyun(*) Gengliang Wang(*) Mridul Muralidharan(*) Liang-Chi Hsieh(*) Takuya Ueshin(*) Kent Yao Chao Sun(*) Hussein Awala Xiao Li(*) Yuanjian Li(*) Denny Lee Felix Cheung(*) Bo Yang Xinrong Meng(*)

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-02 Thread Hyukjin Kwon
10:07 PM, Haejoon Lee > wrote: > >  > > +1 > > On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon wrote: > >> Hi all, >> >> I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark >> Connect) >> >> JIRA <https://issues.apache.o

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Hyukjin Kwon
? > I was not able to find it, but I was on vacation, and so might have > missed this … > > > Regards, > Mridul > > On Sun, Mar 31, 2024 at 9:08 PM Haejoon Lee > wrote: > >> +1 >> >> On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon >> wrote: >> &g

[VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Hyukjin Kwon
Hi all, I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark Connect) JIRA Prototype SPIP doc

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Hyukjin Kwon
One very good example is SparkR releases in Conda channel ( https://github.com/conda-forge/r-sparkr-feedstock). This is fully run by the community unofficially. On Tue, 19 Mar 2024 at 09:54, Mich Talebzadeh wrote: > +1 for me > > Mich Talebzadeh, > Dad | Technologist | Solutions Architect |

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Hyukjin Kwon
+1 On Mon, 11 Mar 2024 at 18:11, yangjie01 wrote: > +1 > > > > Jie Yang > > > > *发件人**: *Haejoon Lee > *日期**: *2024年3月11日 星期一 17:09 > *收件人**: *Gengliang Wang > *抄送**: *dev > *主题**: *Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark > > > > +1 > > > > On Mon, Mar 11, 2024 at

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-04 Thread Hyukjin Kwon
Is this related to https://github.com/apache/spark/pull/42428? cc @Yang,Jie(INF) On Mon, 4 Mar 2024 at 22:21, Jungtaek Lim wrote: > Shall we revisit this functionality? The API doc is built with individual > versions, and for each individual version we depend on other released > versions.

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-20 Thread Hyukjin Kwon
+1 On Tue, 20 Feb 2024 at 22:00, Cheng Pan wrote: > +1 (non-binding) > > - Build successfully from source code. > - Pass integration tests with Spark ClickHouse Connector[1] > > [1] https://github.com/housepower/spark-clickhouse-connector/pull/299 > > Thanks, > Cheng Pan > > > > On Feb 20,

Re: [FYI] SPARK-45981: Improve Python language test coverage

2023-12-02 Thread Hyukjin Kwon
Awesome! On Sat, Dec 2, 2023 at 2:33 PM Dongjoon Hyun wrote: > Hi, All. > > As a part of Apache Spark 4.0.0 (SPARK-44111), the Apache Spark community > starts to have test coverage for all supported Python versions from Today. > > - https://github.com/apache/spark/actions/runs/7061665420 > >

Help for testing Windows specific fix (SPARK-23015)

2023-11-21 Thread Hyukjin Kwon
Hi all, I used to have my Windows environment in another laptop but that laptop is broken now so I don't have Windows env to test Windows PRs out (e.g., https://github.com/apache/spark/pull/43706). If anyone has a Windows env, would appreciate it if you take a look at this. Thanks.

Re: On adding applyInArrow to groupBy and cogroup

2023-11-06 Thread Hyukjin Kwon
Sounds good, I'll review the PR. On Fri, 3 Nov 2023 at 14:08, Abdeali Kothari wrote: > Seeing more support for arrow based functions would be great. > Gives more control to application developers. And so pandas just becomes 1 > of the available options. > > On Fri, 3 Nov 2023, 21:23 Luca

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-03 Thread Hyukjin Kwon
Woohoo! On Tue, 3 Oct 2023 at 22:47, Hussein Awala wrote: > Congrats to all of you! > > On Tue 3 Oct 2023 at 08:15, Rui Wang wrote: > >> Congratulations! Well deserved! >> >> -Rui >> >> >> On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang wrote: >> >>> Congratulations to all! Well deserved! >>>

[RESULT] Updating documentation hosted for EOL and maintenance releases

2023-09-29 Thread Hyukjin Kwon
The vote passes with 9 +1s (6 binding +1s). (* = binding) +1: - Hyukjin Kwon * - Ruifeng Zheng * - Jiaan Geng - Yikun Jiang * - Herman van Hovell * - Michel Miotto Barbosa - Maciej Szymkiewicz * - Denny Lee - Yuanjian Li *

Re: [ANNOUNCE] Apache Spark 3.5.0 released

2023-09-26 Thread Hyukjin Kwon
Awesome! On Wed, 27 Sept 2023 at 11:02, Hussein Awala wrote: > I installed the package, tested it with kubernetes master from Jupyter, > and tested it with Spark Connect server, all looks good. > > On Tue, Sep 26, 2023 at 10:45 PM Yuanjian Li > wrote: > >> FYI, we received the handling from

[VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-25 Thread Hyukjin Kwon
Hi all, I would like to start the vote for updating documentation hosted for EOL and maintenance releases to improve the usability here, and in order for end users to read the proper and correct documentation. For discussion thread, please refer to

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Hyukjin Kwon
+1 On Tue, Sep 12, 2023 at 7:05 AM Xiao Li wrote: > +1 > > Xiao > > Yuanjian Li 于2023年9月11日周一 10:53写道: > >> @Peter Toth I've looked into the details of this >> issue, and it appears that it's neither a regression in version 3.5.0 nor a >> correctness issue. It's a bug related to a new

[DISCUSS] Updating documentation hosted for EOL and maintenance releases

2023-08-30 Thread Hyukjin Kwon
Hi all, I would like to raise a discussion about updating documentation hosted for EOL and maintenance versions. To provide some context, we currently host the documentation for EOL versions of Apache Spark, which can be found at links like

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-08-30 Thread Hyukjin Kwon
Which Python version will run that stored procedure? All Python versions supported in PySpark How to manage external dependencies? Existing way we have https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html . In fact, this will use the external dependencies within your

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-08-30 Thread Hyukjin Kwon
+1 we should have this .. a lot of other projects and DBMSes have this too, and we currently don't have a way to handle them within Apache Spark. Disclaimer: I am the shepherd of this SPIP. On Thu, 31 Aug 2023 at 09:31, Allison Wang wrote: > Hi Mich, > > I've updated the permissions on the

Re: Welcome two new Apache Spark committers

2023-08-06 Thread Hyukjin Kwon
Woohoo! On Mon, 7 Aug 2023 at 11:28, Ruifeng Zheng wrote: > Congratulations! Peter and Xiduo! > > On Mon, Aug 7, 2023 at 10:13 AM Xiao Li wrote: > >> Congratulations, Peter and Xiduo! >> >> >> >> Debasish Das 于2023年8月6日周日 19:08写道: >> >>> Congratulations Peter and Xidou. >>> >>> On Sun, Aug 6,

Re: LLM script for error message improvement

2023-08-02 Thread Hyukjin Kwon
I think adding that dev tool script to improve the error message is fine. On Thu, 3 Aug 2023 at 10:24, Haejoon Lee wrote: > Dear contributors, I hope you are doing well! > > I see there are contributors who are interested in working on error > message improvements and persistent contribution,

Re: [VOTE] SPIP: XML data source support

2023-07-29 Thread Hyukjin Kwon
+1 On Sat, 29 Jul 2023 at 22:49, Maciej wrote: > +1 > > Best regards, > Maciej Szymkiewicz > > Web: https://zero323.net > PGP: A30CEF0C31A501EC > > On 7/29/23 11:28, Mich Talebzadeh wrote: > > +1 for me. > > Though Databriks did a good job releasing the code. > > GitHub - databricks/spark-xml:

Re: Spark 3.0.0 EOL

2023-07-24 Thread Hyukjin Kwon
It's already EOL On Mon, Jul 24, 2023 at 4:17 PM Pralabh Kumar wrote: > Hi Dev Team > > If possible , can you please provide the Spark 3.0.0 EOL timelines . > > Regards > Pralabh Kumar > > > > >

Re: Spark Docker Official Image is now available

2023-07-19 Thread Hyukjin Kwon
This is amazing, finally! On Thu, 20 Jul 2023 at 10:10, Yikun Jiang wrote: > The spark Docker Official Image is now available: > https://hub.docker.com/_/spark > > $ docker run -it --rm *spark* /opt/spark/bin/spark-shell > $ docker run -it --rm *spark*:python3 /opt/spark/bin/pyspark > $ docker

Re: [DISCUSS] SPIP: XML data source support

2023-07-19 Thread Hyukjin Kwon
ort is it to use the spark-xml library today? What's the > drawback to keeping this as an external library as-is? > > Best Regards, Martin > -- > *From:* Hyukjin Kwon > *Sent:* Wednesday, July 19, 2023 01:27 > *To:* Sandip Agarwala > *Cc:* dev@spark.

Re: [DISCUSS] SPIP: XML data source support

2023-07-18 Thread Hyukjin Kwon
> XML data in spark. Making spark-xml built-in will provide a better user > experience for Spark SQL and structured streaming. The proposal is to > inline code from the spark-xml package. > I am collaborating with Hyukjin Kwon, who is the original author of > spark-xml, for this e

Re: [VOTE][SPIP] Python Data Source API

2023-07-05 Thread Hyukjin Kwon
+1. See https://youtu.be/yj7XlTB1Jvc?t=604 :-). On Thu, 6 Jul 2023 at 09:15, Allison Wang wrote: > Hi all, > > I'd like to start the vote for SPIP: Python Data Source API. > > The high-level summary for the SPIP is that it aims to introduce a simple > API in Python for Data Sources. The idea

Re: Introducing English SDK for Apache Spark - Seeking Your Feedback and Contributions

2023-07-03 Thread Hyukjin Kwon
The demo was really amazing. On Tue, 4 Jul 2023 at 09:17, Farshid Ashouri wrote: > This is wonderful news! > > On Tue, 4 Jul 2023 at 01:14, Gengliang Wang wrote: > >> Dear Apache Spark community, >> >> We are delighted to announce the launch of a groundbreaking tool that >> aims to make Apache

Re: Time for Spark v3.5.0 release

2023-07-03 Thread Hyukjin Kwon
Yeah one day postponed shouldn't be a big deal. On Tue, Jul 4, 2023 at 7:10 AM Yuanjian Li wrote: > Hi All, > > According to the Spark versioning policy at > https://spark.apache.org/versioning-policy.html, should we cut > *branch-3.5* on *July 17th, 2023*? (We initially proposed January 16th,

Re: [ANNOUNCE] Apache Spark 3.4.1 released

2023-06-23 Thread Hyukjin Kwon
Thanks! On Sat, Jun 24, 2023 at 11:01 AM Mridul Muralidharan wrote: > > Thanks Dongjoon ! > > Regards, > Mridul > > On Fri, Jun 23, 2023 at 6:58 PM Dongjoon Hyun wrote: > >> We are happy to announce the availability of Apache Spark 3.4.1! >> >> Spark 3.4.1 is a maintenance release containing

Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Hyukjin Kwon
+1 On Thu, 22 Jun 2023 at 02:20, Jacek Laskowski wrote: > +0 > > Pozdrawiam, > Jacek Laskowski > > "The Internals Of" Online Books > Follow me on https://twitter.com/jaceklaskowski > > > > > On Wed, Jun 21, 2023 at 5:11 PM

Re: [VOTE] Release Spark 3.4.1 (RC1)

2023-06-21 Thread Hyukjin Kwon
+1 On Wed, 21 Jun 2023 at 14:23, yangjie01 wrote: > +1 > > > 在 2023/6/21 13:20,“L. C. Hsieh”mailto:vii...@gmail.com>> > 写入: > > > +1 > > > On Tue, Jun 20, 2023 at 8:48 PM Dongjoon Hyun > wrote: > > > > +1 > > > > Dongjoon > > > > On 2023/06/20 02:51:32 Jia Fan

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Hyukjin Kwon
Actually I support this idea in a way that Python developers don't have to learn Scala to write their own source (and separate packaging). This is more crucial especially when you want to write a simple data source that interacts with the Python ecosystem. On Tue, 20 Jun 2023 at 03:08, Denny Lee

Re: [VOTE] Apache Spark PMC asks Databricks to differentiate its Spark version string

2023-06-18 Thread Hyukjin Kwon
With the spirit of open source, -1. At least there have been other cases mentioned in the discussion thread, and solely doing it for one specific vendor would not solve the problem, and I wouldn't also expect to cast a vote for each case publicly. I would prefer to start this in the narrower

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-18 Thread Hyukjin Kwon
The major concerns raised in the thread were that we should initiate the discussion for the below first: - Apache Spark 4.0.0 Preview (and Dates) - Apache Spark 4.0.0 Items - Apache Spark 4.0.0 Plan Adjustment before setting the timeline for Spark 4.0.0 because we're unclear on the picture of

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-15 Thread Hyukjin Kwon
I am supportive of setting the timeline for Spark 4.0, and I think it has to be done soon. If my understanding is correct, we better need to set up the goals and major changes to happen in 4.0.0? That one I agree with too. Having a preview sounds good to me too so people can try it out. Given

Re: Add user as a contributor

2023-06-14 Thread Hyukjin Kwon
You can open a PR first. When that's merged, the ticket will be assigned to you with the contribuor access On Thu, Jun 15, 2023 at 1:07 PM Aman Raj wrote: > Hi team, > > Can someone please help giving contributor access to amanraj2520 username. > I have raised a Spark Ticket :

Re: [DISCUSS] SPIP: Add PySpark Test Framework

2023-06-13 Thread Hyukjin Kwon
Yeah, I have been thinking about this too, and Holden did some work here that this SPIP will reuse. I support this. On Wed, 14 Jun 2023 at 08:10, Amanda Liu wrote: > Hi all, > > I'd like to start a discussion about implementing an official PySpark test > framework. Currently, there's no

Re: [DISCUSS] Add SQL functions into Scala, Python and R API

2023-05-31 Thread Hyukjin Kwon
Thanks all. I created a JIRA at https://issues.apache.org/jira/browse/SPARK-43907. On Mon, 29 May 2023 at 09:12, Hyukjin Kwon wrote: > Yes, some were cases like you mentioned. > But I found myself explaining that reason to a lot of people, not only > developers but users - I

Re: Apache Spark 3.5.0 Expectations (?)

2023-05-29 Thread Hyukjin Kwon
While I support going forward with a higher version, actually using Scala 2.13 by default is a big deal especially in a way that: - Users would likely download the built-in version assuming that it’s backward binary compatible. - PyPI doesn't allow specifying the Scala version, meaning

Re: [DISCUSS] Add SQL functions into Scala, Python and R API

2023-05-28 Thread Hyukjin Kwon
gt;>>> 5808 W Sunset Blvd | Los Angeles, CA 90028 >>>> <https://www.google.com/maps/search/5808+W+Sunset+Blvd%C2%A0+%7C%C2%A0+Los+Angeles,+CA+90028?entry=gmail=g> >>>> >>>> >>>> >>>> On Wed, May 24, 2023 at 12:44 AM Enr

Re: [DISCUSS] Add SQL functions into Scala, Python and R API

2023-05-25 Thread Hyukjin Kwon
>>> On Wed, May 24, 2023 at 12:44 AM Enrico Minack >>> wrote: >>> >>>> +1 >>>> >>>> Functions available in SQL (more general in one API) should be >>>> available in all APIs. I am very much in favor of this. >>>

[DISCUSS] Add SQL functions into Scala, Python and R API

2023-05-24 Thread Hyukjin Kwon
Hi all, I would like to discuss adding all SQL functions into Scala, Python and R API. We have SQL functions that do not exist in Scala, Python and R around 175. For example, we don’t have pyspark.sql.functions.percentile but you can invoke it as a SQL function, e.g., SELECT percentile(...). The

Re: [CONNECT] New Clients for Go and Rust

2023-05-24 Thread Hyukjin Kwon
I think we can just start this with a separate repo. I am fine with the second option too but in this case we would have to triage which language to add into the main repo. On Fri, 19 May 2023 at 22:28, Maciej wrote: > Hi, > > Personally, I'm strongly against the second option and have some >

Re: PR builder broken

2023-05-10 Thread Hyukjin Kwon
I think this happens globally https://www.githubstatus.com/ On Thu, May 11, 2023 at 6:50 AM Xingbo Jiang wrote: > Hi dev, > > I've seen multiple PR builder failures like below since this morning: > ``` > TypeError: Cannot read properties of undefined (reading 'head_sha') > at eval (eval at

Re: [VOTE] Release Apache Spark 3.2.4 (RC1)

2023-04-10 Thread Hyukjin Kwon
+1 On Tue, 11 Apr 2023 at 11:04, Ruifeng Zheng wrote: > +1 (non-binding) > > Thank you for driving this release! > > -- > Ruifeng Zheng > ruife...@foxmail.com > >

Re: [VOTE] Release Apache Spark 3.4.0 (RC6)

2023-04-06 Thread Hyukjin Kwon
Merged the fix. On Fri, 7 Apr 2023 at 10:07, Xinrong Meng wrote: > Thanks @yangjie01. I marked SPARK-39696 as a blocker. > > On Thu, Apr 6, 2023 at 4:35 PM yangjie01 wrote: > >> -1 for me due to this RC not include the fix of SPARK-39696, SPARK-39696 >> will fix a data race issue in access to

Re: Apache Spark 3.2.4 EOL Release?

2023-04-04 Thread Hyukjin Kwon
+1 On Wed, 5 Apr 2023 at 07:31, Mridul Muralidharan wrote: > > +1 > Sounds good to me. > > Thanks, > Mridul > > > On Tue, Apr 4, 2023 at 1:39 PM huaxin gao wrote: > >> +1 >> >> On Tue, Apr 4, 2023 at 11:17 AM Chao Sun wrote: >> >>> +1 >>> >>> On Tue, Apr 4, 2023 at 11:12 AM Holden Karau >>>

Re: [VOTE] Release Apache Spark 3.4.0 (RC3)

2023-03-09 Thread Hyukjin Kwon
BTW doing another RC isn't a very big deal (compared to what I did before :-) ) since it's not a canonical release yet. On Fri, Mar 10, 2023 at 7:58 AM Hyukjin Kwon wrote: > I guess directly tagging is fine too I guess. > I don't mind cutting the RC4 right away either if that's what you

Re: [VOTE] Release Apache Spark 3.4.0 (RC3)

2023-03-09 Thread Hyukjin Kwon
I guess directly tagging is fine too I guess. I don't mind cutting the RC4 right away either if that's what you prefer. On Fri, Mar 10, 2023 at 7:06 AM Xinrong Meng wrote: > Hi All, > > Thank you all for catching that. Unfortunately, the release script failed > to push the release tag

Re: [Question] Can't start Spark Connect

2023-03-08 Thread Hyukjin Kwon
Just doing a clean build with Maven, and running a test case like `SparkConnectServiceSuite` in IntelliJ should work. On Wed, 8 Mar 2023 at 15:02, Jia Fan wrote: > Hi developers, >I want to contribute some code for Spark Connect. Any doc for starters? > I want to debug

Re: [DISCUSS] Show Python code examples first in Spark documentation

2023-02-26 Thread Hyukjin Kwon
destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Sun, 26 Feb 2023

Re: [DISCUSS] Show Python code examples first in Spark documentation

2023-02-26 Thread Hyukjin Kwon
Probably it's worthwhile discussing the order for others but I would keep it separate from this thread to focus on Python as the default since that can be done as an incremental improvement. On Mon, Feb 27, 2023 at 3:36 AM Mich Talebzadeh wrote: > > To me as I stated before this is a

Re: [DISCUSS] Show Python code examples first in Spark documentation

2023-02-23 Thread Hyukjin Kwon
gt;>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> If this is not just flip flopping the document pages and involves >>>>> other changes, then a proper impact analysis needs to be done to assess >>>>> the >>>>> eff

Re: [VOTE] Release Apache Spark 3.4.0 (RC1)

2023-02-23 Thread Hyukjin Kwon
Yes we should fix. I will take a look On Thu, 23 Feb 2023 at 07:32, Jonathan Kelly wrote: > Thanks! I was wondering about that ClientE2ETestSuite failure today, so > I'm glad to know that it's also being experienced by others. > > On a similar note, I am experiencing the following error when

Re: [DISCUSS] Show Python code examples first in Spark documentation

2023-02-22 Thread Hyukjin Kwon
how Python code examples first in Spark >> documentation >> >> +1 Good idea! >> >> On Thu, Feb 23, 2023 at 7:41 AM Jack Goodson >> wrote: >> >>> Good idea, at the company I work at we discussed using Scala as our >>> primary language b

Re: [DISCUSS] Show Python code examples first in Spark documentation

2023-02-22 Thread Hyukjin Kwon
+1 I like this idea too. On Thu, Feb 23, 2023 at 6:00 AM Allan Folting wrote: > Hi all, > > I would like to propose that we show Python code examples first in the > Spark documentation where we have multiple programming language examples. > An example is on the Quick Start page: >

Re: [DISCUSS] Make release cadence predictable

2023-02-15 Thread Hyukjin Kwon
>>> If people are OK with that discipline, sure. >>> A hard 6-month cycle would mean the minor releases are more frequent and >>> have less change in them. That's probably OK. We could also decide to >>> choose a longer cadence like 9 months, but I don't kno

Re: [VOTE][RESULT] Release Spark 3.3.2 (RC1)

2023-02-15 Thread Hyukjin Kwon
Awesome! On Thu, 16 Feb 2023 at 06:39, Dongjoon Hyun wrote: > Great! Thank you, Liang-Chi! > > Dongjoon. > > On Wed, Feb 15, 2023 at 9:22 AM L. C. Hsieh wrote: > >> The vote passes with 12 +1s (4 binding +1s). >> Thanks to all who helped with the release! >> >> (* = binding) >> +1: >> - Mridul

Re: Time for release v3.3.2

2023-01-30 Thread Hyukjin Kwon
+100! On Tue, 31 Jan 2023 at 10:54, Chao Sun wrote: > +1, thanks Liang-Chi for volunteering! > > Chao > > On Mon, Jan 30, 2023 at 5:51 PM L. C. Hsieh wrote: > > > > Hi Spark devs, > > > > As you know, it has been 4 months since Spark 3.3.1 was released on > > 2022/10, it seems a good time to

Re: Time for Spark 3.4.0 release?

2023-01-24 Thread Hyukjin Kwon
Thanks Xinrong. On Wed, 25 Jan 2023 at 12:01, Xinrong Meng wrote: > Hi All, > > Apache Spark 3.4 is cut as https://github.com/apache/spark/tree/branch-3.4 > . > > Thanks, > > Xinrong Meng > > On Wed, Jan 18, 2023 at 3:45 PM Hyukjin Kwon wrote: > >>

Re: Time for Spark 3.4.0 release?

2023-01-17 Thread Hyukjin Kwon
r point? What is > the estimate deadline for that? > > Enrico > > > Am 18.01.23 um 07:59 schrieb Hyukjin Kwon: > > These look like we can fix it after the branch-cut so should be fine. > > On Wed, 18 Jan 2023 at 15:57, Enrico Minack > wrote: > >> Hi Xinrong, >> >

Re: Time for Spark 3.4.0 release?

2023-01-17 Thread Hyukjin Kwon
3.4 to be ready by that time. > > Feel free to reply to the email if you have other ongoing big items for > Spark 3.4. > > Thanks, > > Xinrong Meng > > On Sat, Jan 7, 2023 at 9:16 AM Hyukjin Kwon wrote: > >> Thanks Xinrong. >> >> On Sat, Jan 7, 202

Re: Time for Spark 3.4.0 release?

2023-01-17 Thread Hyukjin Kwon
nch-3.4* at *18:30 PT, January 24, 2023*. Please ensure > your changes for Apache Spark 3.4 to be ready by that time. > > Feel free to reply to the email if you have other ongoing big items for > Spark 3.4. > > Thanks, > > Xinrong Meng > > On Sat, Jan 7, 2023 at 9:16 A

SparkR build with AppVeyor, broken by external reason

2023-01-16 Thread Hyukjin Kwon
Hi all, AppVeyor is currently broken assuming the flaky Github authorization issue ( https://help.appveyor.com/discussions/problems/11287-the-build-phase-is-set-to-msbuild-mode-default-but-no-visual-studio-project-or-solution-files-were-found ). AppVeyor build is specific to SparkR (on WIndows)

Re: [DISCUSS] Deprecate DStream in 3.4

2023-01-12 Thread Hyukjin Kwon
+1 On Fri, 13 Jan 2023 at 08:51, Jungtaek Lim wrote: > bump for more visibility. > > On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > >> Hi dev, >> >> I'd like to propose the deprecation of DStream in Spark 3.4, in favor of >> promoting Structured

Re: Base Docker image caching broken in CI

2023-01-11 Thread Hyukjin Kwon
Seems like it's fixed now! On Wed, 11 Jan 2023 at 15:58, Hyukjin Kwon wrote: > Hi all, > > ghcr is flaky now, so we will have to wait for a couple of days and see if > it gets fixed up soon. > See also > https://github.com/apache/spark/pull/39490#issuecomment-1378190

Base Docker image caching broken in CI

2023-01-10 Thread Hyukjin Kwon
Hi all, ghcr is flaky now, so we will have to wait for a couple of days and see if it gets fixed up soon. See also https://github.com/apache/spark/pull/39490#issuecomment-1378190658 Thanks Yikun for taking a look at this.

Re: Time for Spark 3.4.0 release?

2023-01-06 Thread Hyukjin Kwon
gt;>>>> On Tue, Jan 3, 2023 at 9:44 PM Rui Wang >>>>>>>> wrote: >>>>>>>> >>>>>>>>> +1 to cut the branch starting from a workday! >>>>>>>>> >>>>>>>>

Re: Time for Spark 3.4.0 release?

2023-01-03 Thread Hyukjin Kwon
SGTM +1 On Wed, Jan 4, 2023 at 2:13 PM Xinrong Meng wrote: > Hi All, > > Shall we cut *branch-3.4* on *January 16th, 2023*? We proposed January > 15th per > https://spark.apache.org/versioning-policy.html, but I would suggest we > postpone one day since January 15th is a Sunday. > > I would

Re: maven build failing in spark sql w/BouncyCastleProvider CNFE

2022-12-05 Thread Hyukjin Kwon
Steve, does the lower version of scala plugin work for you? If that solves, we could temporary downgrade for now. On Mon, 5 Dec 2022 at 22:23, Steve Loughran wrote: > trying to build spark master w/ hadoop trunk and the maven sbt plugin is > failing. This doesn't happen with the 3.3.5 RC0; > >

Re: Contributions needed: 4 higher order functions

2022-12-01 Thread Hyukjin Kwon
022, at 5:35 AM, Hyukjin Kwon wrote: > >  > Hi all, > > There are four higher order functions in our backlog: > > - https://issues.apache.org/jira/browse/SPARK-41235 > - https://issues.apache.org/jira/browse/SPARK-41234 > - https://issues.apache.org/jira/browse/SPARK-41233

Re: [VOTE][SPIP] Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Hyukjin Kwon
+1 On Thu, 1 Dec 2022 at 12:39, Mridul Muralidharan wrote: > > +1 > > Regards, > Mridul > > On Wed, Nov 30, 2022 at 8:55 PM Xingbo Jiang > wrote: > >> +1 >> >> On Wed, Nov 30, 2022 at 5:59 PM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >>> Starting with +1 from me. >>> >>> On

Re: [DISCUSSION] SPIP: Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Hyukjin Kwon
+1 On Thu, 1 Dec 2022 at 08:10, Shixiong Zhu wrote: > +1 > > This is exciting. I agree with Jerry that this SPIP and continuous > processing are orthogonal. This SPIP itself would be a great improvement > and impact most Structured Streaming users. > > Best Regards, > Shixiong > > > On Wed, Nov

Contributions needed: 4 higher order functions

2022-11-30 Thread Hyukjin Kwon
Hi all, There are four higher order functions in our backlog: - https://issues.apache.org/jira/browse/SPARK-41235 - https://issues.apache.org/jira/browse/SPARK-41234 - https://issues.apache.org/jira/browse/SPARK-41233 - https://issues.apache.org/jira/browse/SPARK-41232 Would be a great chance

Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Hyukjin Kwon
Thanks, Yuming. On Wed, 26 Oct 2022 at 16:01, L. C. Hsieh wrote: > Thank you for driving the release of Apache Spark 3.3.1, Yuming! > > On Tue, Oct 25, 2022 at 11:38 PM Dongjoon Hyun > wrote: > > > > It's great. Thank you so much, Yuming! > > > > Dongjoon > > > > On Tue, Oct 25, 2022 at 11:23

Re: Enforcing scalafmt on Spark Connect - connector/connect

2022-10-14 Thread Hyukjin Kwon
I personally like this idea. At least we now do this in PySpark, and it's pretty nice that you can just forget about formatting it manually by yourself. On Fri, 14 Oct 2022 at 16:37, Martin Grund wrote: > Hi folks, > > I'm reaching out to ask to gather input / consensus on the following >

Welcome Yikun Jiang as a Spark committer

2022-10-07 Thread Hyukjin Kwon
Hi all, The Spark PMC recently added Yikun Jiang as a committer on the project. Yikun is the major contributor of the infrastructure and GitHub Actions in Apache Spark as well as Kubernates and PySpark. He has put a lot of effort into stabilizing and optimizing the builds so we all can work

Re: [VOTE][RESULT] SPIP: Support Docker Official Image for Spark

2022-09-25 Thread Hyukjin Kwon
There was a typo in the result email. I am resending now: The vote passes with 4 +10s (4 binding +10s). +1: Hyukjin Kwon* Ruifeng Zheng Yikun Jiang Qian Sun Kent Yao Rui Chen Xiangrui Meng* Gengliang Wang* Martin Grigorov Yang Jie Ankit Gupta Denny Lee Bryan Cutler Dongjoon Hyun* 0: None -1

[VOTE][RESULT] SPIP: Support Docker Official Image for Spark

2022-09-24 Thread Hyukjin Kwon
The vote passes with 4 +10s (4 binding +10s). +1: Hyukjin Kwon* Ruifeng Zheng Yikun Jiang Qian Sun Kent Yao Rui Chen Xiangrui Meng* Gengliang Wang* Martin Grigorov Yang Jie Ankit Gupta Denny Lee Bryan Cutler Dongjoon Hyun* 0: None (Tom has voiced some architectural concerns) -1: None

Re: [VOTE] SPIP: Support Docker Official Image for Spark

2022-09-21 Thread Hyukjin Kwon
Starting with my +1. On Thu, 22 Sept 2022 at 10:41, Hyukjin Kwon wrote: > Hi all, > > I would like to start a vote for SPIP: "Support Docker Official Image for > Spark" > > The goal of the SPIP is to add Docker Official Image(DOI) > <https://github.com/docker-

[VOTE] SPIP: Support Docker Official Image for Spark

2022-09-21 Thread Hyukjin Kwon
Hi all, I would like to start a vote for SPIP: "Support Docker Official Image for Spark" The goal of the SPIP is to add Docker Official Image(DOI) to ensure the Spark Docker images meet the quality standards for Docker images, to provide these

Re: [DISCUSS] SPIP: Support Docker Official Image for Spark

2022-09-21 Thread Hyukjin Kwon
Given that support, I will start the vote officially. On Thu, 22 Sept 2022 at 08:40, Yikun Jiang wrote: > @Ankit > > Thanks for your support! Your questions are very valuable, but this SPIP > is just a start point to cover existing apache/spark image features first. > And we will also set up a

Re: [DISCUSS] SPIP: Support Docker Official Image for Spark

2022-09-18 Thread Hyukjin Kwon
+1 On Mon, 19 Sept 2022 at 09:15, Yikun Jiang wrote: > Hi, all > > I would like to start the discussion for supporting Docker Official Image > for Spark. > > This SPIP is proposed to add Docker Official Image(DOI) > to ensure the Spark >

Creating a new component "Connect" in JIRA

2022-09-16 Thread Hyukjin Kwon
Hi all, I created a new component called "Connect" temporarily for the Spark Connect project, see https://issues.apache.org/jira/browse/SPARK-39375 because a lot of changes will be made in an isolated location, and the concept itself is pretty isolated as a separate component In addition, this

Re: Time for Spark 3.3.1 release?

2022-09-12 Thread Hyukjin Kwon
+1 On Tue, 13 Sept 2022 at 06:45, Gengliang Wang wrote: > +1. > Thank you, Yuming! > > On Mon, Sep 12, 2022 at 12:10 PM L. C. Hsieh wrote: > >> +1 >> >> Thanks Yuming! >> >> On Mon, Sep 12, 2022 at 11:50 AM Dongjoon Hyun >> wrote: >> > >> > +1 >> > >> > Thanks, >> > Dongjoon. >> > >> > On

Re: Contributions and help needed in SPARK-40005

2022-08-30 Thread Hyukjin Kwon
Oh, that's a mistake. please just go ahead and reuse that JIRA :-). You can just create a PR with reusing the same JIRA ID for functions.py On Wed, 31 Aug 2022 at 01:18, Khalid Mammadov wrote: > Hi @Hyukjin Kwon > > I see you have resolved the JIRA and I got some more thi

Re: Contributions and help needed in SPARK-40005

2022-08-19 Thread Hyukjin Kwon
at 16:50, Khalid Mammadov wrote: > I am picking up "functions.py" if noone is already > > On Fri, 19 Aug 2022, 07:56 Khalid Mammadov, > wrote: > >> I thought it's all finished (checked few). Do you have list of those 50%? >> Happy to contribute  >>

Re: Contributions and help needed in SPARK-40005

2022-08-18 Thread Hyukjin Kwon
<https://issues.apache.org/jira/browse/SPARK-40010> is built to track > progress. > > Hyukjin Kwon gurwls...@gmail.com <http://mailto:gurwls...@gmail.com> > 于2022年8月9日周二 10:58写道: > > Please go ahead. Would be very appreciated. >> >> On Tue, 9 Aug 2022 at 11:58, Qi

Re: Welcoming three new PMC members

2022-08-09 Thread Hyukjin Kwon
Congrats everybody! On Wed, 10 Aug 2022 at 05:50, Mridul Muralidharan wrote: > > Congratulations ! > Great to have you join the PMC !! > > Regards, > Mridul > > On Tue, Aug 9, 2022 at 11:57 AM vaquar khan wrote: > >> Congratulations >> >> On Tue, Aug 9, 2022, 11:40 AM Xiao Li wrote: >> >>> Hi

Welcome Xinrong Meng as a Spark committer

2022-08-09 Thread Hyukjin Kwon
Hi all, The Spark PMC recently added Xinrong Meng as a committer on the project. Xinrong is the major contributor of PySpark especially Pandas API on Spark. She has guided a lot of new contributors enthusiastically. Please join me in welcoming Xinrong!

Re: Contributions and help needed in SPARK-40005

2022-08-08 Thread Hyukjin Kwon
Please go ahead. Would be very appreciated. On Tue, 9 Aug 2022 at 11:58, Qian SUN wrote: > Hi Hyukjin > > I would like to do some work and pick up *Window.py *if possible. > > Thanks, > Qian > > Hyukjin Kwon 于2022年8月9日周二 10:41写道: > >> Thanks Khalid for taking

Re: Contributions and help needed in SPARK-40005

2022-08-08 Thread Hyukjin Kwon
o good to track these pending issues somewhere to > avoid effort duplication. > > For example, I would like to pick up *union* and *union all* if no > one has already. > > Thanks, > Khalid > > > On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon wrote: > >> Hi all, >>

Contributions and help needed in SPARK-40005

2022-08-08 Thread Hyukjin Kwon
Hi all, I am trying to improve PySpark documentation especially: - Make the examples self-contained, e.g., https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html - Document Parameters

Re: How does PySpark send "import" to the worker when executing Python UDFs?

2022-07-19 Thread Hyukjin Kwon
This is done by cloudpickle. They pickle global variables referred within the func together, and register it to the global imported modules. On Wed, 20 Jul 2022 at 00:55, Li Jin wrote: > Hi, > > I have a question about how does "imports" get send to the python worker. > > For example, I have >

  1   2   3   4   5   6   7   >