Re: [DISCUSS] Spark 4.0.0 release

2024-05-08 Thread Holden Karau
Is there some point of contact that can provide me needed context and >> permissions? >> I'd also love to see why the costs are high and see how we can reduce >> them... >> >> Thanks, >> Nimrod >> >> On Wed, May 8, 2024 at 8:26 AM Holden Karau >

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Holden Karau
> will be automated and the only thing which will be manual is to sign the > release for security reasons that would be reasonable. > > Thanks, > Nimrod > > > בתאריך יום ד׳, 8 במאי 2024, 00:54, מאת Holden Karau ‏< > holden.ka...@gmail.com>: > >> Indeed. We could concei

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Holden Karau
ore, my pgp >> key is lost, etc.). I'll start the RC process at my tomorrow. Thanks for >> your patience! >> >> Wenchen >> >> On Fri, May 3, 2024 at 7:47 AM yangjie01 wrote: >> >>> +1 >>> >>> >>> >>> *发件人**: *Jun

Re: ASF board report draft for May

2024-05-06 Thread Holden Karau
gt;>>> >>>> In addition, Apache Spark PMC received an official notice from ASF >>>> Infra team. >>>> >>>> https://lists.apache.org/thread/rgy1cg17tkd3yox7qfq87ht12sqclkbg >>>> > [NOTICE] Apache Spark's GitHub Actions us

Re: ASF board report draft for May

2024-05-06 Thread Holden Karau
possible, we >> opened a blocker-level JIRA issue and have been working on it. >> - https://infra.apache.org/github-actions-policy.html >> >> Please include a sentence that Apache Spark PMC is working on under the >> following umbrella JIRA issue. >> >

Re: ASF board report draft for May

2024-05-05 Thread Holden Karau
Do we want to include that we’re planning on having a preview release of Spark 4 so folks can see the APIs “soon”? Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams:

Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Holden Karau
+1 :) yay previews On Wed, May 1, 2024 at 5:36 PM Chao Sun wrote: > +1 > > On Wed, May 1, 2024 at 5:23 PM Xiao Li wrote: > >> +1 for next Monday. >> >> We can do more previews when the other features are ready for preview. >> >> Tathagata Das 于2024年5月1日周三 08:46写道: >> >>> Next week sounds

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Fri, Apr 26, 2024 at 12:06 PM L. C. Hsieh wrote: > +1 > > On Fri, Apr 26, 2024

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Thu, Apr 25, 2024 at 11:18 AM Maciej wrote: > +1 > > Best regards, > Maciej

Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-13 Thread Holden Karau
+1 -- even if it's not perfect now is the time to change default values On Sat, Apr 13, 2024 at 4:11 PM Hyukjin Kwon wrote: > +1 > > On Sun, Apr 14, 2024 at 7:46 AM Chao Sun wrote: > >> +1. >> >> This feature is very helpful for guarding against correctness issues, >> such as null results due

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Holden Karau
On Wed, Apr 10, 2024 at 9:54 PM Binwei Yang wrote: > > Gluten currently already support Velox backend and Clickhouse backend. > data fusion support is also proposed but no one worked on it. > > Gluten isn't a POC. It's under actively developing but some companies > already used it. > > > On

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Holden Karau
I like the idea of improving flexibility of Sparks physical plans and really anything that might reduce code duplication among the ~4 or so different accelerators. Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9

Re: Apache Spark 3.4.3 (?)

2024-04-06 Thread Holden Karau
Sounds good to me :) Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Sat, Apr 6, 2024 at 2:51 PM Dongjoon Hyun wrote: > Hi, All.

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Mon, Apr 1, 2024 at 5:44 PM Xinrong Meng wrote: > +1 > > Thank you @Hyukjin

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-12 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Mon, Mar 11, 2024 at 7:44 PM Reynold Xin wrote: > +1 > > > On Mon, Mar 11 2024

Re: Generating config docs automatically

2024-02-21 Thread Holden Karau
I think this is a good idea. I like having everything in one source of truth rather than two (so option 1 sounds like a good idea); but that’s just my opinion. I'd be happy to help with reviews though. On Wed, Feb 21, 2024 at 6:37 AM Nicholas Chammas wrote: > I know config documentation is not

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-13 Thread Holden Karau
This looks really cool :) Out of interest what are the differences in the approach between this and Glutten? On Tue, Feb 13, 2024 at 12:42 PM Chao Sun wrote: > Hi all, > > We are very happy to announce that Project Comet, a plugin to > accelerate Spark query execution via leveraging DataFusion

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-16 Thread Holden Karau
Oh interesting solution, a co-worker was suggesting something similar using resource profiles to increase memory -- but your approach avoids a lot of complexity I like it (and we could extend it out to support resource profile growth too). I think an SPIP sounds like a great next step. On Tue,

Re: [VOTE] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-14 Thread Holden Karau
+1 On Tue, Nov 14, 2023 at 10:21 AM DB Tsai wrote: > +1 > > DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > > On Nov 14, 2023, at 10:14 AM, Vakaris Baškirov < > vakaris.bashki...@gmail.com> wrote: > > +1 (non-binding) > > > On Tue, Nov 14, 2023 at 8:03 PM Chao Sun wrote: > >> +1

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-12 Thread Holden Karau
To be clear: I am generally supportive of the idea (+1) but have some follow-up questions: Have we taken the time to learn from the other operators? Do we have a compatible CRD/API or not (and if so why?) The API seems to assume that everything is packaged in the container in advance, but I

Re: Apache Spark 3.4.2 (?)

2023-11-06 Thread Holden Karau
+1 On Mon, Nov 6, 2023 at 4:30 PM yangjie01 wrote: > +1 > > > > *发件人**: *Yuming Wang > *日期**: *2023年11月7日 星期二 07:00 > *收件人**: *Santosh Pingale > *抄送**: *Dongjoon Hyun , dev > > *主题**: *Re: Apache Spark 3.4.2 (?) > > > > +1 > > > > On Tue, Nov 7, 2023 at 3:55 AM Santosh Pingale > wrote: > >

Re: Write Spark Connection client application in Go

2023-09-12 Thread Holden Karau
That’s so cool! Great work y’all :) On Tue, Sep 12, 2023 at 8:14 PM bo yang wrote: > Hi Spark Friends, > > Anyone interested in using Golang to write Spark application? We created a > Spark > Connect Go Client library . > Would love to hear

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-07 Thread Holden Karau
+1 pip installing seems to function :) On Thu, Sep 7, 2023 at 7:22 PM Yuming Wang wrote: > +1. > > On Thu, Sep 7, 2023 at 10:33 PM yangjie01 > wrote: > >> +1 >> >> >> >> *发件人**: *Gengliang Wang >> *日期**: *2023年9月7日 星期四 12:53 >> *收件人**: *Yuanjian Li >> *抄送**: *Xiao Li ,

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-09-02 Thread Holden Karau
Can we delay the next RC cut until after Labor Day? On Sat, Sep 2, 2023 at 9:59 PM Yuanjian Li wrote: > Thank you for all the reports! > The vote has failed. I plan to cut RC4 in two days. > > @Dipayan Dev I quickly skimmed through the > corresponding ticket, and it doesn't seem to be a

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-23 Thread Holden Karau
d >>> London >>> United Kingdom >>> >>> >>>view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>>

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-08 Thread Holden Karau
Mich Talebzadeh >>>> wrote: >>>> >>>> >>>> >>>> Hi, >>>> >>>> >>>> >>>> From what I have seen spark on a serverless cluster has hard up getting >>>> the driver going in a time

Re: ASF board report draft for August 2023

2023-08-08 Thread Holden Karau
Maybe add a link to the 4.0 JIRA where we are tracking the current plans for 4.0? On Tue, Aug 8, 2023 at 9:33 AM Dongjoon Hyun wrote: > Thank you, Matei. > > It looks good to me. > > Dongjoon > > On Mon, Aug 7, 2023 at 22:54 Matei Zaharia > wrote: > >> It’s time to send our quarterly report to

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-07 Thread Holden Karau
Oooh fascinating. I’m going on call this week so it will take me awhile but I do want to review this :) On Mon, Aug 7, 2023 at 5:30 PM Pavan Kotikalapudi wrote: > Hi Spark Dev, > > I have extended traditional DRA to work for structured streaming > use-case. > > Here is an initial Implementation

Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-07 Thread Holden Karau
Oh great point On Mon, Aug 7, 2023 at 2:23 PM bo yang wrote: > Thanks Holden for bringing this up! > > Maybe another thing to think about is how to make dynamic allocation more > friendly with Kubernetes and disaggregated shuffle storage? > > > > On Mon, Aug 7, 2023

Improving Dynamic Allocation Logic for Spark 4+

2023-08-07 Thread Holden Karau
So I wondering if there is interesting in revisiting some of how Spark is doing it's dynamica allocation for Spark 4+? Some things that I've been thinking about: - Advisory user input (e.g. a way to say after X is done I know I need Y where Y might be a bunch of GPU machines) - Configurable

Re: [VOTE][SPIP] Python Data Source API

2023-07-07 Thread Holden Karau
+1 On Fri, Jul 7, 2023 at 9:55 AM huaxin gao wrote: > +1 > > On Fri, Jul 7, 2023 at 8:59 AM Mich Talebzadeh > wrote: > >> +1 for me >> >> Mich Talebzadeh, >> Solutions Architect/Engineering Lead >> Palantir Technologies Limited >> London >> United Kingdom >> >> >>view my Linkedin profile

Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Holden Karau
? On Wed, Jun 21, 2023 at 8:30 AM Reynold Xin wrote: > +1 > > This is a great idea. > > > On Wed, Jun 21, 2023 at 8:29 AM, Holden Karau > wrote: > >> I’d like to start with a +1, better Python testing tools integrated into >> the project make sense. >> >

Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Holden Karau
I’d like to start with a +1, better Python testing tools integrated into the project make sense. On Wed, Jun 21, 2023 at 8:11 AM Amanda Liu wrote: > Hi all, > > I'd like to start the vote for SPIP: PySpark Test Framework. > > The high-level summary for the SPIP is that it proposes an official

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-20 Thread Holden Karau
ut if it only entails changing to >>> Scala 2.13 and dropping support for JDK 8, then we could also just release >>> a month after 3.5. >>> >>> How about we do this? We get 3.5 released, and afterwards we do a couple >>> of meetings where we build this road

Re: Gauging interest in: ScalaFix + Scala Steward for Spark 4.0

2023-06-12 Thread Holden Karau
Yup I think buidling consensus on what goes in 4.X is something we’ll need to do. On Mon, Jun 12, 2023 at 11:56 AM Dongjoon Hyun wrote: > Thank you for sharing those. I'm also interested in taking advantage of > it. Also, I hope `spark-upgrade` can help us in line with Spark 4.0. > > However,

Gauging interest in: ScalaFix + Scala Steward for Spark 4.0

2023-06-12 Thread Holden Karau
My self and a few folks have been working on a spark-upgrade project (focused on getting folks onto current versions of Spark). Since it looks like were starting the discussion around Spark 4 I was thinking now could be a good time for us to consider if we want to try and integrate auto-upgrade

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread Holden Karau
-0 I'd like to see more of a doc around what we're planning on for a 4.0 before we pick a target release date etc. (feels like cart before the horse). But it's a weak preference. On Mon, Jun 12, 2023 at 11:24 AM Xiao Li wrote: > Thanks for starting the vote. > > I do have a concern about the

Re: JDK version support policy?

2023-06-07 Thread Holden Karau
So JDK 11 is still supported in open JDK until 2026, I'm not sure if we're going to see enough folks moving to JRE17 by the Spark 4 release unless we have a strong benefit from dropping 11 support I'd be inclined to keep it. On Tue, Jun 6, 2023 at 9:08 PM Dongjoon Hyun wrote: > I'm also +1 on

Re: ASF policy violation and Scala version issues

2023-06-06 Thread Holden Karau
So I think if the Spark PMC wants to ask Databricks something that could be reasonable (although I'm a little fuzzy as to the ask), but that conversation might belong on private@ (I could be wrong of course). On Tue, Jun 6, 2023 at 3:29 AM Mich Talebzadeh wrote: > I concur with you Sean. > > If

Re: Slack for Spark Community: Merging various threads

2023-04-07 Thread Holden Karau
I think there was some concern around how to make any sync channel show up in logs / index / search results? On Fri, Apr 7, 2023 at 9:41 AM Dongjoon Hyun wrote: > Thank you, All. > > I'm very satisfied with the focused and right questions for the real > issues by removing irrelevant claims. :)

Re: Apache Spark 3.2.4 EOL Release?

2023-04-04 Thread Holden Karau
+1 On Tue, Apr 4, 2023 at 11:04 AM L. C. Hsieh wrote: > +1 > > Sounds good and thanks Dongjoon for driving this. > > On 2023/04/04 17:24:54 Dongjoon Hyun wrote: > > Hi, All. > > > > Since Apache Spark 3.2.0 passed RC7 vote on October 12, 2021, branch-3.2 > > has been maintained and served well

Re: Ammonite as REPL for Spark Connect

2023-03-22 Thread Holden Karau
I am +1 to the general concept of including Ammonite magic 彩. On Wed, Mar 22, 2023 at 4:58 PM Herman van Hovell wrote: > Ammonite is maintained externally by Li Haoyi et al. We are including it > as a 'provided' dependency. The integration bits and pieces (1 file) are > included in Apache

Re: SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-18 Thread Holden Karau
Is there someone focused on streaming work these days who would want to shepherd this? On Sat, Feb 18, 2023 at 5:02 PM Dongjoon Hyun wrote: > Thank you for considering me, but may I ask what makes you think to put me > there, Mich? I'm curious about your reason. > > > I have put dongjoon.hyun

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread Holden Karau
That’s legit, if the patch author isn’t comfortable with a backport then let’s leave it be  On Mon, Feb 13, 2023 at 9:59 AM Dongjoon Hyun wrote: > Hi, All. > > As the author of that `Improvement` patch, I strongly disagree with giving > the wrong idea which Python 3.11 is officially supported

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread Holden Karau
I’d be in favor of a back porting with the idea its a bug fix for a language (admittedly not a version we’ve supported before) On Mon, Feb 13, 2023 at 9:19 AM L. C. Hsieh wrote: > If it is not supported in Spark 3.3.x, it looks like an improvement at > Spark 3.4. > For such cases we usually do

Re: Spark on Kube (virtua) coffee/tea/pop times

2023-02-13 Thread Holden Karau
technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 10 Feb 2023 at 18:58, Holden Karau wrote: > >> Ok so the first iteration of this is booked

Re: Spark on Kube (virtua) coffee/tea/pop times

2023-02-10 Thread Holden Karau
le for any monetary damages arising from > such loss, damage or destruction. > > > > > On Wed, 8 Feb 2023 at 20:12, Holden Karau wrote: > >> My thought here was that it's more focused on getting to understand each >> other's goals / priorities and less solving

Re: Spark on Kube (virtua) coffee/tea/pop times

2023-02-08 Thread Holden Karau
t;>>>>>> Greetings everyone! >>>>>>>>> I am super new to this group and currently leading some work to >>>>>>>>> deploy spark on k8 for my company o9 Solutions. >>>>>>>>> I would love to join the discus

Re: Spark on Kube (virtua) coffee/tea/pop times

2023-02-08 Thread Holden Karau
doodle for the following week with more european friendly times. Let me know what folks think :) On Tue, Feb 7, 2023 at 3:23 PM Holden Karau wrote: > Hi Folks, > > It seems like we could maybe use some additional shared context around > Spark on Kube so I’d like to try and schedule a vi

Re: Spark on Kube (virtua) coffee/tea/pop times

2023-02-07 Thread Holden Karau
use > spark. > > Thanks! > Andrew > > On Tue, Feb 7, 2023 at 5:24 PM Holden Karau wrote: > > > > Hi Folks, > > > > It seems like we could maybe use some additional shared context around > Spark on Kube so I’d like to try and schedule a virtual coffe

Spark on Kube (virtua) coffee/tea/pop times

2023-02-07 Thread Holden Karau
Hi Folks, It seems like we could maybe use some additional shared context around Spark on Kube so I’d like to try and schedule a virtual coffee session. Who all would be interested in virtual adventures around Spark on Kube development? No pressure if the idea of hanging out in a virtual chat

Re: Syndicate Apache Spark Twitter to Mastodon?

2022-12-01 Thread Holden Karau
r) > For Federated features, I think Slack would be a better platform, a lot > of Apache Big data projects have slack for federated features > > чт, 1 дек. 2022 г., 02:33 Holden Karau : > >> I agree that there is probably a majority still on twitter, but it would >> be a syndica

Re: Syndicate Apache Spark Twitter to Mastodon?

2022-11-30 Thread Holden Karau
devs are still using Twitter. > > > чт, 1 дек. 2022 г., 01:35 Holden Karau : > >> Do we want to start syndicating Apache Spark Twitter to a Mastodon >> instance. It seems like a lot of software dev folks are moving over there >> and it would be good to reach our users wh

Syndicate Apache Spark Twitter to Mastodon?

2022-11-30 Thread Holden Karau
Do we want to start syndicating Apache Spark Twitter to a Mastodon instance. It seems like a lot of software dev folks are moving over there and it would be good to reach our users where they are. Any objections / concerns? Any thoughts on which server we should pick if we do this? -- Twitter:

Re: Jupyter notebook on Dataproc versus GKE

2022-09-06 Thread Holden Karau
rise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Mon, 5 Sept 2022 at 20:5

Re: Jupyter notebook on Dataproc versus GKE

2022-09-05 Thread Holden Karau
f data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Mon, 5 Sept 2022 at 12:47, Hold

Re: Jupyter notebook on Dataproc versus GKE

2022-09-05 Thread Holden Karau
I’ve run Jupyter w/Spark on K8s, haven’t tried it with Dataproc personally. The Spark K8s pod scheduler is now more pluggable for Yunikorn and Volcano can be used with less effort. On Mon, Sep 5, 2022 at 7:44 AM Mich Talebzadeh wrote: > > Hi, > > > Has anyone got experience of running Jupyter

Re: [SPARK-39515] Improve scheduled jobs in GitHub Actions

2022-06-20 Thread Holden Karau
How about a hallway meet up at Data AI summit to talk about build CI if folks are Interested? On Sun, Jun 19, 2022 at 7:50 PM Hyukjin Kwon wrote: > Increased the priority to a blocker - I don't think we can release with > these build failures and poor CI > > On Mon, 20 Jun 2022 at 10:39,

Re: [VOTE][SPIP] Spark Connect

2022-06-16 Thread Holden Karau
+1 On Thu, Jun 16, 2022 at 7:17 AM Thomas Graves wrote: > +1 for the concept. > Correct me if I'm wrong, but at a high level this is proposing adding > a new user API (which is language agnostic) and the proposal is to > start with something like the Logical Plan, with the addition of being >

Re: [VOTE] Release Spark 3.3.0 (RC6)

2022-06-13 Thread Holden Karau
+1 On Mon, Jun 13, 2022 at 4:51 PM Yuming Wang wrote: > +1 (non-binding) > > On Tue, Jun 14, 2022 at 7:41 AM Dongjoon Hyun > wrote: > >> +1 >> >> Thanks, >> Dongjoon. >> >> On Mon, Jun 13, 2022 at 3:54 PM Chris Nauroth >> wrote: >> >>> +1 (non-binding) >>> >>> I repeated all checks I

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread Holden Karau
Could we make it do the same sort of history server fallback approach? On Tue, May 17, 2022 at 10:41 PM bo yang wrote: > It is like Web Application Proxy in YARN ( > https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html), > to provide easy access for Spark

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread Holden Karau
Oh that’s rad  On Tue, May 17, 2022 at 7:47 AM bo yang wrote: > Hi Spark Folks, > > I built a web reverse proxy to access Spark UI on Kubernetes (working > together with https://github.com/GoogleCloudPlatform/spark-on-k8s-operator). > Want to share here in case other people have similar need.

Re: [VOTE] Release Spark 3.3.0 (RC1)

2022-05-11 Thread Holden Karau
> On Wed, May 11, 2022 at 4:23 AM Hyukjin Kwon wrote: > >> I expect to see RC2 too. I guess he just sticks to the standard, leaving >> the vote open till the end. >> It hasn't got enough +1s anyway :-). >> >> On Wed, 11 May 2022 at 10:17, Holden Karau wrote: >>

Re: [VOTE] Release Spark 3.3.0 (RC1)

2022-05-10 Thread Holden Karau
Technically release don't follow vetos (see https://www.apache.org/foundation/voting.html ) it's up to the RM if they get the minimum number of binding +1s (although they are encouraged to cancel the release if any serious issues are raised). That being said I'll add my -1 based on the issues

Re: Apache Spark 3.3 Release

2022-03-16 Thread Holden Karau
;> #34659 [SPARK-34863][SQL] Support complex types for Parquet > vectorized reader > > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum > > >> > > >> Do you mean we should include them, or exclude them from 3.3? > > >> > > >> Thanks,

Re: Apache Spark 3.3 Release

2022-03-15 Thread Holden Karau
May I suggest we push out one week (22nd) just to give everyone a bit of breathing space? Rushed software development more often results in bugs. On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang wrote: > > To make our release time more predictable, let us collect the PRs and > wait three more days

Re: Apache Spark 3.3 Release

2022-03-14 Thread Holden Karau
On Mon, Mar 14, 2022 at 11:53 PM Xiao Li wrote: > Could you please list which features we want to finish before the branch > cut? How long will they take? > > Xiao > > Chao Sun 于2022年3月14日周一 13:30写道: > >> Hi Max, >> >> As there are still some ongoing work for the above listed SPIPs, can we >>

Re: CVE-2021-38296: Apache Spark Key Negotiation Vulnerability

2022-03-09 Thread Holden Karau
CVEs are generally not mentioned in the release notes or JIRA instead we track them at https://spark.apache.org/security.html once they are resolved (prior to the resolution the reports goes to secur...@spark.apache.org) to allow the project time to fix the issue before public disclosure so there

Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-25 Thread Holden Karau
maël > > ps. Any plans to make this images official docker images at some point > (for the extra security/validation) [1] > [1] https://docs.docker.com/docker-hub/official_images/ > > On Mon, Feb 21, 2022 at 10:09 PM Holden Karau > wrote: > > > > We are happy t

Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-22 Thread Holden Karau
jre-slim-buster latest >>>>> 31ed15daa2bf 12 hours ago >>>>> 531MB >>>>> >>>>> Then push it with (example) >>>>> >>>>> docker push apache/spark/tags/spark-3.1.3-sc

Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-21 Thread Holden Karau
ybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in

Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-21 Thread Holden Karau
g on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Mon, 21 Feb 2022 at 21:09, Holden Karau wrote: > >> We are happy to annou

[ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-21 Thread Holden Karau
We are happy to announce the availability of Spark 3.1.3! Spark 3.1.3 is a maintenance release containing stability fixes. This release is based on the branch-3.1 maintenance branch of Spark. We strongly recommend all 3.1 users to upgrade to this stable release. To download Spark 3.1.3, head

Re: [VOTE] Spark 3.1.3 RC4

2022-02-18 Thread Holden Karau
The vote passes with no 0s or -1s and the following +1: Holden Karau John Zhuge Mridul Muralidharan Thomas graves Gengliang Wang Wenchen Fan Yuming Wang Ruifeng Zheng Sean Owen I will begin finalizing the release now. On Fri, Feb 18, 2022 at 2:49 PM Holden Karau wrote: > +1 my s

Re: [VOTE] Spark 3.1.3 RC4

2022-02-18 Thread Holden Karau
ith -Pyarn -Pmesos -Pkubernetes >> >> Regards, >> Mridul >> >> >> On Wed, Feb 16, 2022 at 8:32 AM Thomas graves wrote: >> >>> +1 >>> >>> Tom >>> >>> On Mon, Feb 14, 2022 at 2:55 PM Holden Karau >>> wrote:

[VOTE] Spark 3.1.3 RC4

2022-02-14 Thread Holden Karau
Please vote on releasing the following candidate as Apache Spark version 3.1.3. The vote is open until Feb. 18th at 1 PM pacific (9 PM GMT) and passes if a majority +1 PMC votes are cast, with a minimum of 3 + 1 votes. [ ] +1 Release this package as Apache Spark 3.1.3 [ ] -1 Do not release this

Re: [VOTE] Spark 3.1.3 RC3

2022-02-08 Thread Holden Karau
Yup, I’ve run into some weirdness with docs again I want to verify before I send the vote email though. On Mon, Feb 7, 2022 at 10:06 PM Wenchen Fan wrote: > Shall we use the release scripts of branch 3.1 to release 3.1? > > On Fri, Feb 4, 2022 at 4:57 AM Holden Karau wrote: > &

Re: [VOTE] Spark 3.1.3 RC3

2022-02-03 Thread Holden Karau
ecember (Dec 6) when we were talking about release 3.2.1. >>>> >>>> Tom >>>> >>>> On Wed, Feb 2, 2022 at 2:07 AM Mridul Muralidharan >>>> wrote: >>>> > >>>> > Hi Holden, >>>> > >>>> > N

Re: [VOTE] SPIP: Catalog API for view metadata

2022-02-03 Thread Holden Karau
+1 (binding) On Thu, Feb 3, 2022 at 2:26 PM Erik Krogen wrote: > +1 (non-binding) > > Really looking forward to having this natively supported by Spark, so that > we can get rid of our own hacks to tie in a custom view catalog > implementation. I appreciate the care John has put into various

[VOTE] Spark 3.1.3 RC3

2022-02-01 Thread Holden Karau
Please vote on releasing the following candidate as Apache Spark version 3.1.3. The vote is open until Feb. 4th at 5 PM PST (1 AM UTC + 1 day) and passes if a majority +1 PMC votes are cast, with a minimum of 3 + 1 votes. [ ] +1 Release this package as Apache Spark 3.1.3 [ ] -1 Do not release

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Holden Karau
On Fri, Jan 21, 2022 at 6:48 PM Sean Owen wrote: > Continue on the ticket - I am not sure this is established. We would block > a release for critical problems that are not regressions. This is not a > data loss / 'deleting data' issue even if valid. > You're welcome to provide feedback but

Re: Tries on migrating Spark Linux arm64 Job from Jenkins to GitHub Actions

2022-01-08 Thread Holden Karau
Personally I’d love to see us compiling and testing on Linux arm64 as well. On Sat, Jan 8, 2022 at 7:49 PM Yikun Jiang wrote: > BTW, this is not intended to be in potential opposition to Apache Spark > Infra 2022 which dongjoon mentioned in "Apache Spark Jenkins Infra 2022". > It is just to

Re: [VOTE][SPIP] Support Customized Kubernetes Schedulers Proposal

2022-01-05 Thread Holden Karau
+1 (binding) On Wed, Jan 5, 2022 at 5:31 PM William Wang wrote: > +1 (non-binding) > > Yikun Jiang 于2022年1月6日周四 09:07写道: > >> Hi all, >> >> I’d like to start a vote for SPIP: "Support Customized Kubernetes >> Schedulers Proposal" >> >> The SPIP is to support customized Kubernetes schedulers in

Re: [DISCUSSION] SPIP: Support Volcano/Alternative Schedulers Proposal

2022-01-05 Thread Holden Karau
;>>>>>> Definitely yes, we are on the same page. >>>>>>>> >>>>>>>> I think we have the same goal: propose a general and reasonable >>>>>>>> mechanism to make spark on k8s with a custom scheduler more usabl

Re: Log4j 1.2.17 spark CVE

2021-12-12 Thread Holden Karau
My understanding is it only applies to log4j 2+ so we don’t need to do anything. On Sun, Dec 12, 2021 at 8:46 PM Pralabh Kumar wrote: > Hi developers, users > > Spark is built using log4j 1.2.17 . Is there a plan to upgrade based on > recent CVE detected ? > > > Regards > Pralabh kumar > --

Re: [Apache Spark Jenkins] build system shutting down Dec 23th, 2021

2021-12-06 Thread Holden Karau
Shane you kick ass thank you for everything you’ve done for us :) Keep on rocking :) On Mon, Dec 6, 2021 at 4:24 PM Hyukjin Kwon wrote: > Thanks, Shane. > > On Tue, 7 Dec 2021 at 09:19, Dongjoon Hyun > wrote: > >> I really want to thank you for all your help. >> You've done so many things for

Re: [DISCUSSION] SPIP: Support Volcano/Alternative Schedulers Proposal

2021-11-30 Thread Holden Karau
Thanks for putting this together, I’m really excited for us to add better batch scheduling integrations. On Tue, Nov 30, 2021 at 12:46 AM Yikun Jiang wrote: > Hey everyone, > > I'd like to start a discussion on "Support Volcano/Alternative Schedulers > Proposal". > > This SPIP is proposed to

Re: DataFrame.mapInArrow

2021-11-10 Thread Holden Karau
Sorry I've been busy, I'll try and take a look tomorrow, excited to see this progress though :) On Wed, Nov 10, 2021 at 9:01 PM Hyukjin Kwon wrote: > Last reminder: I plan to merge this in a few more days. Any feedback and > review would be very appreciated. > > On Tue, 9 Nov 2021 at 21:51,

Re: [VOTE] SPIP: Storage Partitioned Join for Data Source V2

2021-10-29 Thread Holden Karau
+1 On Fri, Oct 29, 2021 at 3:07 PM DB Tsai wrote: > +1 > > DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > > > On Fri, Oct 29, 2021 at 11:42 AM Ryan Blue wrote: > >> +1 >> >> On Fri, Oct 29, 2021 at 11:06 AM huaxin gao >> wrote: >> >>> +1 >>> >>> On Fri, Oct 29, 2021 at 10:59

Re: [VOTE] Release Spark 3.2.0 (RC7)

2021-10-10 Thread Holden Karau
+1 On Sun, Oct 10, 2021 at 10:46 PM Wenchen Fan wrote: > +1 > > On Sat, Oct 9, 2021 at 2:36 PM angers zhu wrote: > >> +1 (non-binding) >> >> Cheng Pan 于2021年10月9日周六 下午2:06写道: >> >>> +1 (non-binding) >>> >>> Integration test passed[1] with my project[2]. >>> >>> [1] >>>

Re: [VOTE] Release Spark 3.2.0 (RC6)

2021-09-29 Thread Holden Karau
PySpark smoke tests pass, I'm going to do a last pass through the JIRAs before my vote though. On Wed, Sep 29, 2021 at 8:54 AM Sean Owen wrote: > +1 looks good to me as before, now that a few recent issues are resolved. > > > On Tue, Sep 28, 2021 at 10:45 AM Gengliang Wang wrote: > >> Please

Re: [VOTE] Release Spark 3.2.0 (RC5)

2021-09-27 Thread Holden Karau
I think even if we do cancel this RC we should leave it open for a bit to see if we can catch any other errors. On Mon, Sep 27, 2021 at 12:29 PM Dongjoon Hyun wrote: > Unfortunately, it's the same for me recently. Not only that, but I also > hit MetaspaceSize OOM, too. > I ended up with

Adding Spark 4 to JIRA for targetted versions

2021-09-13 Thread Holden Karau
Hi Folks, I'm going through the Spark 3.2 tickets just to make sure were not missing anything important and I was wondering what folks thoughts are on adding Spark 4 so we can target API breaking changes to the next major version and avoid loosing track of the issue. Cheers, Holden :) --

Re: Add option to Spark UI to proxy to the executors?

2021-08-25 Thread Holden Karau
So I tried turning on the Spark exec UI proxy but it broke the Spark UI (in 3.1.2) and regardless of what URL I requested everything came back as text/html of the jobs page. Is anyone actively using this feature in prod? On Sun, Aug 22, 2021 at 5:58 PM Holden Karau wrote: > Oh cool. I’ll h

Re: Add option to Spark UI to proxy to the executors?

2021-08-22 Thread Holden Karau
gt;> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be l

Add option to Spark UI to proxy to the executors?

2021-08-20 Thread Holden Karau
Hi Folks, I'm wondering what people think about the idea of having the Spark UI (optionally) act as a proxy to the executors? This could help with exec UI access in some deployment environments. Cheers, Holden :) -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High

-1s on committed but not released code?

2021-08-19 Thread Holden Karau
Hi Y'all, This just recently came up but I'm not super sure on how we want to handle this in general. If code was committed under the lazy consensus model and then a committer or PMC -1s it post merge, what do we want to do? I know we had some previous discussion around -1s, but that was largely

Re: Time to start publishing Spark Docker Images?

2021-08-17 Thread Holden Karau
gt;>>>>>> correctly it was around 400MB for existing images). >>>>>>> >>>>>>> >>>>>>> On 8/17/21 2:24 PM, Mich Talebzadeh wrote: >>>>>>> >>>>>>> Examples: >>>>

Re: Time to start publishing Spark Docker Images?

2021-08-17 Thread Holden Karau
; users/organisations. My suggestions is to create for a given type (spark, >> spark-py etc): >> >> >>1. One vanilla flavour for everyday use with few useful packages >>2. One for medium use with most common packages for ETL/ELT stuff >>3. One s

Re: Time to start publishing Spark Docker Images?

2021-08-16 Thread Holden Karau
n.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The a

  1   2   3   4   5   >