Spark with External Shuffle Service - using saved shuffle files in the event of executor failure

2021-05-12 Thread Chris Thomas
Hi, I am pretty confident I have observed Spark configured with the Shuffle Service continuing to fetch shuffle files on a node in the event of executor failure, rather than recompute the shuffle files as happens without the Shuffle Service. Can anyone confirm this? (I have a SO question 

Fwd: Spark API and immutability

2020-05-25 Thread Chris Thomas
The cache() method on the DataFrame API caught me out. Having learnt that DataFrames are built on RDDs and that RDDs are immutable, when I saw the statement df.cache() in our codebase I thought ‘This must be a bug, the result is not assigned, the statement will have no affect.’ However, I’ve