Hi Karuppayya,
thanks for your proposal and bringing up this issue.
I am very much in favour of a shuffle storage solution that allows for
dynamic allocation and node failure in a K8S environment, without the
burden of managing an Remote Shuffle Service.
I have the following comments:
Your proposed consolidation stage is equivalent to the next reducer
stage in the sense that it reads shuffle data from the earlier map
stage. This requires the executors of the map stage to survive until the
shuffle data are consolidated ("merged" in Spark terminology).
Therefore, I think this passage of your design document is not accurate:
Executors that perform the initial map tasks (shuffle writers) can
be immediately deallocated after writing their shuffle data ...
Since the consolidation stage reads all the shuffle data, why not doing
the transformation in that stage? What is the point in deferring the
transformations into another stage?
You mention the "Native Shuffle Block Migration" and say its limitation
is "It simply shifts the storage burden to other active executors".
Please consider that the migration process can migrate to a (in Spark
called) fallback storage, which essentially copies the shuffle data to a
remote storage.
Kind regards,
Enrico
Am 13.11.25 um 01:40 schrieb karuppayya:
Hi All,
I propose to utilize *Remote Storage as a Shuffle Store, natively in
Spark* .
This approach would fundamentally decouple shuffle storage from
compute nodes, mitigating *shuffle fetch failures and also help with
aggressive downscaling*.
The primary goal is to enhance the *elasticity and resilience* of
Spark workloads, leading to substantial cost optimization opportunities.
*I welcome any initial thoughts or concerns regarding this idea.*
*Looking forward to your feedback! *
JIRA: SPARK-53484 <https://issues.apache.org/jira/browse/SPARK-54327>
SPIP doc
<https://docs.google.com/document/d/1leywkLgD62-MdG7e57n0vFRi7ICNxn9el9hpgchsVnk/edit?tab=t.0#heading=h.u4h68wupq6lw>,
Design doc
<https://docs.google.com/document/d/1tuWyXAaIBR0oVD5KZwYvz7JLyn6jB55_35xeslUEu7s/edit?tab=t.0>
PoC PR <https://github.com/apache/spark/pull/53028>
Thanks,
Karuppayya