Improving Dynamic Allocation Logic for Spark 4+

Holden Karau Mon, 07 Aug 2023 13:28:03 -0700

So I wondering if there is interesting in revisiting some of how Spark is
doing it's dynamica allocation for Spark 4+?


Some things that I've been thinking about:

- Advisory user input (e.g. a way to say after X is done I know I need Y
where Y might be a bunch of GPU machines)
- Configurable tolerance (e.g. if we have at most Z% over target no-op)
- Past runs of same job (e.g. stage X of job Y had a peak of K)
- Faster executor launches (I'm a little fuzzy on what we can do here but,
one area for example is we setup and tear down an RPC connection to the
driver with a blocking call which does seem to have some locking inside of
the driver at first glance)

Is this an area other folks are thinking about? Should I make an epic we
can track ideas in? Or are folks generally happy with today's dynamic
allocation (or just busy with other things)?

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Improving Dynamic Allocation Logic for Spark 4+

Reply via email to