So I wondering if there is interesting in revisiting some of how Spark is doing it's dynamica allocation for Spark 4+?
Some things that I've been thinking about: - Advisory user input (e.g. a way to say after X is done I know I need Y where Y might be a bunch of GPU machines) - Configurable tolerance (e.g. if we have at most Z% over target no-op) - Past runs of same job (e.g. stage X of job Y had a peak of K) - Faster executor launches (I'm a little fuzzy on what we can do here but, one area for example is we setup and tear down an RPC connection to the driver with a blocking call which does seem to have some locking inside of the driver at first glance) Is this an area other folks are thinking about? Should I make an epic we can track ideas in? Or are folks generally happy with today's dynamic allocation (or just busy with other things)? -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau