Re: [EXTERNAL] spark re-use shuffle files not happening

2022-07-16 Thread Koert Kuipers
ion), not cross jobs. > -- > *From:* Koert Kuipers > *Sent:* Saturday, July 16, 2022 6:43 PM > *To:* user > *Subject:* [EXTERNAL] spark re-use shuffle files not happening > > > *ATTENTION:* This email originated from outside of GM. > > >

Re: [EXTERNAL] spark re-use shuffle files not happening

2022-07-16 Thread Shay Elbaz
Spark can reuse shuffle stages in the same job (action), not cross jobs. From: Koert Kuipers Sent: Saturday, July 16, 2022 6:43 PM To: user Subject: [EXTERNAL] spark re-use shuffle files not happening ATTENTION: This email originated from outside of GM. i

spark re-use shuffle files not happening

2022-07-16 Thread Koert Kuipers
i have seen many jobs where spark re-uses shuffle files (and skips a stage of a job), which is an awesome feature given how expensive shuffles are, and i generally now assume this will happen. however i feel like i am going a little crazy today. i did the simplest test in spark 3.3.0, basically i