As I understand Spark releases > 3 currently do not support external shuffle. Is there any timelines when this could be available?
For now we have two parameters for Dynamic Resource Allocation. These are --conf spark.dynamicAllocation.enabled=true \ --conf spark.dynamicAllocation.shuffleTracking.enabled=true \ The idea is to use dynamic resource allocation where the driver tracks the shuffle files and evicts only executors not storing active shuffle files. So in a nutshell these shuffle files are stored in the executors themselves in the absence of the external shuffle. The model works on the basis of the "one-container-per-Pod" model <https://kubernetes.io/docs/concepts/workloads/pods/> meaning that for each node of the cluster there will be one node running the driver and each remaining node running one executor each. If I over-provision my GKE cluster, for example adding one redundant node and increasing the number of executors by one it should improve the latency. Has there been any benchmarks on this feature? Thanks view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.