Re: How do you debug a code-generated aggregate?

2024-02-12 Thread Herman van Hovell
There is no really easy way of getting the state of the aggregation buffer, unless you are willing to modify the code generation and sprinkle in some logging. What I would start with is dumping the generated code by calling explain('codegen') on the DataFrame. That helped me to find similar

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread Herman van Hovell
+1 On Tue, Sep 26, 2023 at 10:39 AM yangjie01 wrote: > +1 > > > > *发件人**: *Yikun Jiang > *日期**: *2023年9月26日 星期二 18:06 > *收件人**: *dev > *抄送**: *Hyukjin Kwon , Ruifeng Zheng < > ruife...@apache.org> > *主题**: *Re: [VOTE] Updating documentation hosted for EOL and maintenance > releases > > > >

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-06 Thread Herman van Hovell
Tested connect, and everything looks good. +1 On Wed, Sep 6, 2023 at 8:11 AM Yuanjian Li wrote: > Please vote on releasing the following candidate(RC4) as Apache Spark > version 3.5.0. > > The vote is open until 11:59pm Pacific time Sep 8th and passes if a > majority +1 PMC votes are cast,

Re: [Reminder] Spark 3.5 Branch Cut

2023-07-16 Thread Herman van Hovell
Hi Yuanjian, For the ongoing encoder work for the connect scala client I'd like to get the following tickets in: - SPARK-44396 : Direct Arrow Deserialization - SPARK-9 :

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-19 Thread Herman van Hovell
Dongjoon, I am not sure if I am not sure if I follow the line of thought here. Multiple people have asked for clarification on what Spark 4.0 would mean (Holden, Mridul, Jia & Xiao). You can - for the record - also add me to this list. However you choose to single out Xiao because asks this

Re: [VOTE] Release Apache Spark 3.4.0 (RC5)

2023-03-30 Thread Herman van Hovell
+1 On Thu, Mar 30, 2023 at 11:05 PM Sean Owen wrote: > +1 same result from me as last time. > > On Thu, Mar 30, 2023 at 3:21 AM Xinrong Meng > wrote: > >> Please vote on releasing the following candidate(RC5) as Apache Spark >> version 3.4.0. >> >> The vote is open until 11:59pm Pacific time

Re: Ammonite as REPL for Spark Connect

2023-03-23 Thread Herman van Hovell
g else I am missing ? > > Regards, > Mridul > > > > On Wed, Mar 22, 2023 at 6:58 PM Herman van Hovell > wrote: > >> Ammonite is maintained externally by Li Haoyi et al. We are including it >> as a 'provided' dependency. The integration bits and pieces (1 file) are &

Re: Ammonite as REPL for Spark Connect

2023-03-22 Thread Herman van Hovell
Apache Spark ? > > Regards , > Mridul > > > > On Wed, Mar 22, 2023 at 6:50 PM Herman van Hovell > wrote: > >> Hi All, >> >> For Spark Connect Scala Client we are working on making the REPL >> experience a bit nicer <https://github.com/apache/spar

Ammonite as REPL for Spark Connect

2023-03-22 Thread Herman van Hovell
Hi All, For Spark Connect Scala Client we are working on making the REPL experience a bit nicer . In a nutshell we want to give users a turn key scala REPL, that works even if you don't have a Spark distribution on your machine (through coursier

Re: [Question] Can't start Spark Connect

2023-03-08 Thread Herman van Hovell
Hi Jia, How are you building connect? Kind regards, Herman On Wed, Mar 8, 2023 at 8:48 AM Jia Fan wrote: > Thanks for reply, > I had done clean build with maven few times. But always report > >

Re: [VOTE] Release Apache Spark 3.4.0 (RC1)

2023-02-22 Thread Herman van Hovell
Hi All, Thanks for testing the 3.4.0 RC! I apologize for the maven testing failures for the Spark Connect Scala Client. We will try to get those sorted as soon as possible. This is an artifact of having multiple build systems, and only running CI for one (SBT). That, however, is a debate for

Re: Depolying stage-level scheduling for Spark SQL

2022-09-29 Thread Herman van Hovell
I think issue 2 is caused by adaptive query execution. This will break apart queries into multiple jobs, each subsequent job will generate a RDD that is based on previous ones. As for 1. I am not sure how much you want to expose to an end user here. SQL is declarative, and it does not specify how

Re: Why are hash functions seeded with 42?

2022-09-26 Thread Herman van Hovell
Sorry about that, it made me laugh 6 years ago, I didn't expect this to come back and haunt me :)... There are ways out of this, none of them are particularly appealing: - Add a SQL conf to make the value configurable. - Add a seed parameter to the function. I am not sure if we can make this work

[VOTE][RESULT] SPIP: Spark Connect

2022-06-16 Thread Herman van Hovell
The vote passes with 17 +1s (10 binding +1s). +1: Herman van Hovell* Matei Zaharia* Yuming Wang Hyukjin Kwon* Chao Sun L.C. Hsieh* Huaxin Gao Ruifeng Zheng Wenchen Fan* Believer Xiao Li* Reynold Xin* Dongjoon Hyun* Gangliang Wang Yikun Jiang Tom Graves * Holden Karau * 0: None (Tom has voiced

Re: [VOTE][SPIP] Spark Connect

2022-06-13 Thread Herman van Hovell
Let me kick off the voting... +1 On Mon, Jun 13, 2022 at 2:02 PM Herman van Hovell wrote: > Hi all, > > I’d like to start a vote for SPIP: "Spark Connect" > > The goal of the SPIP is to introduce a Dataframe based client/server API > for Spark > > Pl

[VOTE][SPIP] Spark Connect

2022-06-13 Thread Herman van Hovell
Hi all, I’d like to start a vote for SPIP: "Spark Connect" The goal of the SPIP is to introduce a Dataframe based client/server API for Spark Please also refer to: - Previous discussion in dev mailing list: [DISCUSS] SPIP: Spark Connect - A client and server interface for Apache Spark.

Re: [VOTE] Release Spark 3.3.0 (RC6)

2022-06-13 Thread Herman van Hovell
+1 On Mon, Jun 13, 2022 at 12:53 PM Wenchen Fan wrote: > +1, tests are all green and there are no more blocker issues AFAIK. > > On Fri, Jun 10, 2022 at 12:27 PM Maxim Gekk > wrote: > >> Please vote on releasing the following candidate as >> Apache Spark version 3.3.0. >> >> The vote is open

Re: [VOTE] Release Spark 2.4.8 (RC2)

2021-04-13 Thread Herman van Hovell
+1 On Tue, Apr 13, 2021 at 2:40 AM sarutak wrote: > +1 (non-binding) > > > +1 > > > > On Tue, 13 Apr 2021, 02:58 Sean Owen, wrote: > > > >> +1 same result as last RC for me. > >> > >> On Mon, Apr 12, 2021, 12:53 AM Liang-Chi Hsieh > >> wrote: > >> > >>> Please vote on releasing the following

Re: [VOTE] Release Spark 3.1.1 (RC3)

2021-02-22 Thread Herman van Hovell
+1 On Mon, Feb 22, 2021 at 12:59 PM Jungtaek Lim wrote: > +1 (non-binding) > > Verified signatures. Only a few commits added after RC2 which don't seem > to change the SS behavior, so I'd carry over my +1 from RC2. > > On Mon, Feb 22, 2021 at 3:57 PM Hyukjin Kwon wrote: > >> Starting with my

Re: [VOTE] Release Spark 3.0.2 (RC1)

2021-02-16 Thread Herman van Hovell
+1 On Tue, Feb 16, 2021 at 11:08 AM Hyukjin Kwon wrote: > +1 > > 2021년 2월 16일 (화) 오후 5:10, Prashant Sharma 님이 작성: > >> +1 >> >> On Tue, Feb 16, 2021 at 1:22 PM Dongjoon Hyun >> wrote: >> >>> Please vote on releasing the following candidate as Apache Spark version >>> 3.0.2. >>> >>> The vote is

Re: [VOTE] Standardize Spark Exception Messages SPIP

2020-11-09 Thread Herman van Hovell
+1 On Mon, Nov 9, 2020 at 2:06 AM Takeshi Yamamuro wrote: > +1 > > On Thu, Nov 5, 2020 at 3:41 AM Xinyi Yu wrote: > >> Hi all, >> >> We had the discussion of SPIP: Standardize Spark Exception Messages at >> >>

Re: Welcoming some new Apache Spark committers

2020-07-15 Thread Herman van Hovell
Congratulations! On Wed, Jul 15, 2020 at 9:00 AM angers.zhu wrote: > Congratulations ! > > angers.zhu > angers@gmail.com > >