Thanks for the comments Reynold. This is an ease of use change, and it is
not absolutely required (as other ease of use changes are not required
either). That said, do we not want to invest in making Spark easier to
configure for the average user, or even the user that is trying out Spark?

Here are my thoughts:

- Why can we use short names for SortShuffleManager ("sort"), but the same
can't be extended? If spark.shuffle.manager is meant to be a pluggable API,
it seems this mapping should be pluggable as well.

- Plugin developers (like my project) would like to produce a simple plugin
jar that can be used for all versions of Spark we support, but
ShuffleManager APIs can change in non-binary compatible ways (it's a
private API). As a result we document setting spark.shuffle.manager to a
fully qualified class that is built for each version of Spark we bundle,
guaranteeing a binary-compatible implementation. Having the ability to
produce a short name for a fully qualified shuffle manager would remove
having to look up this mapping.

- ShuffleManager is very flexible (for good reasons) and it can be used to
move shuffle in several ways, such as RDMA, caching, external stores, etc.
With this flexibility comes working with other open source projects (such
as UCX) that have their own configuration system. In this specific example,
environment variables are needed to setup UCX for use from the JVM and with
defaults that are particular to our shuffle usage. These configurations, as
of today, need to be looked up by the user and applied to their
application, and having a way to setup defaults would greatly improve the
user experience.

Thanks again for your feedback!

Alessandro

On Sat, Nov 4, 2023 at 6:04 PM Reynold Xin <r...@databricks.com> wrote:

> Why do we need this? The reason data source APIs need it is because it
> will be used by very unsophisticated end users and used all the time (for
> each connection / query). Shuffle is something you set up once, presumably
> by fairly sophisticated admins / engineers.
>
>
>
> On Sat, Nov 04, 2023 at 2:42 PM, Alessandro Bellina <abell...@gmail.com>
> wrote:
>
>> Hello devs,
>>
>> I would like to start discussion on the SPIP "ShuffleManager short name
>> registration via SparkPlugin"
>>
>> The idea behind this change is to allow a driver plugin (spark.plugins)
>> to export ShuffleManagers via short names, along with sensible default
>> configurations. Users can then use this short name to enable this
>> ShuffleManager + configs using spark.shuffle.manager.
>>
>> SPIP:
>> https://docs.google.com/document/d/1flijDjMMAAGh2C2k-vg1u651RItaRquLGB_sVudxf6I/edit#heading=h.vqpecs4nrsto
>> JIRA: https://issues.apache.org/jira/browse/SPARK-45792
>>
>> I look forward to hearing your feedback.
>>
>> Thanks
>>
>> Alessandro
>>
>
>

Reply via email to