Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Holden Karau
I think signing the artifacts produced from a secure CI sounds like a good idea. I know we’ve been asked to reduce our GitHub action usage but perhaps someone interested could volunteer to set that up. Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.):

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Nimrod Ofek
Hi, Thanks for the reply. >From my experience, a build on a build server would be much more predictable and less error prone than building on some laptop- and of course much faster to have builds, snapshots, release candidates, early previews releases, release candidates or final releases. It

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Holden Karau
Indeed. We could conceivably build the release in CI/CD but the final verification / signing should be done locally to keep the keys safe (there was some concern from earlier release processes). Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.):

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Dongjoon Hyun
Thank you so much for the update, Wenchen! Dongjoon. On Tue, May 7, 2024 at 10:49 AM Wenchen Fan wrote: > UPDATE: > > Unfortunately, it took me quite some time to set up my laptop and get it > ready for the release process (docker desktop doesn't work anymore, my pgp > key is lost, etc.). I'll

caching a dataframe in Spark takes lot of time

2024-05-07 Thread Prem Sahoo
Hello Folks, in Spark I have read a file and done some transformation and finally writing to hdfs. Now I am interested in writing the same dataframe to MapRFS but for this Spark will execute the full DAG again (recompute all the previous steps)(all the read + transformations ). I don't want

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Nimrod Ofek
Hi, Sorry for the novice question, Wenchen - the release is done manually from a laptop? Not using a CI CD process on a build server? Thanks, Nimrod On Tue, May 7, 2024 at 8:50 PM Wenchen Fan wrote: > UPDATE: > > Unfortunately, it took me quite some time to set up my laptop and get it > ready

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Wenchen Fan
UPDATE: Unfortunately, it took me quite some time to set up my laptop and get it ready for the release process (docker desktop doesn't work anymore, my pgp key is lost, etc.). I'll start the RC process at my tomorrow. Thanks for your patience! Wenchen On Fri, May 3, 2024 at 7:47 AM yangjie01

Spark not creating staging dir for insertInto partitioned table

2024-05-07 Thread Sanskar Modi
Hi Folks, I wanted to check why spark doesn't create staging dir while doing an insertInto on partitioned tables. I'm running below example code – ``` spark.sql("set hive.exec.dynamic.partition.mode=nonstrict") val rdd = sc.parallelize(Seq((1, 5, 1), (2, 1, 2), (4, 4, 3))) val df =