Matt Cheah created SPARK-32504:
----------------------------------

             Summary: Shuffle Storage API: Dynamic updates of shuffle metadata
                 Key: SPARK-32504
                 URL: https://issues.apache.org/jira/browse/SPARK-32504
             Project: Spark
          Issue Type: Sub-task
          Components: Shuffle
    Affects Versions: 3.0.0
            Reporter: Matt Cheah


When using external storage for shuffles as part of the shuffle storage API 
mechanism, it is often desirable to update the metadata associated with 
shuffles that we have enabled plugin systems to implement via 
https://issues.apache.org/jira/browse/SPARK-31801. For example:
 # If data is stored in some replicated manner, and the number of replicas is 
updated - then we want the metadata stored on the driver to reflect the new 
number of replicas and where they are located.
 # If data is stored on the mapper's local disk, but is asynchronously backed 
up to some external storage medium, then we want to know when a backup is 
available externally.

To achieve this, we would need to pass a hook to updating the shuffle metadata 
to the shuffle executor components at the root of the plugin tree on the 
executor side. The executor would establish an RPC connection with the driver 
and send messages to update shuffle metadata accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to