Hello, On Fri, Feb 28, 2020 at 13:21 Xingbo Jiang <jiangxb1...@gmail.com> wrote:
> Hi all, > > Based on my experience, there is no scenario that necessarily requires > deploying multiple Workers on the same node with Standalone backend. A > worker should book all the resources reserved to Spark on the host it is > launched, then it can allocate those resources to one or more executors > launched by this worker. Since each executor runs in a separated JVM, we > can limit the memory of each executor to avoid long GC pause. > > The remaining concern is the local-cluster mode is implemented by > launching multiple workers on the local host, we might need to re-implement > LocalSparkCluster to launch only one Worker and multiple executors. It > should be fine because local-cluster mode is only used in running Spark > unit test cases, thus end users should not be affected by this change. > > Removing multiple workers on the same host support could simplify the > deploy model of Standalone backend, and also reduce the burden to support > legacy deploy pattern in the future feature developments. (There is an > example in https://issues.apache.org/jira/browse/SPARK-27371 , where we > designed a complex approach to coordinate resource requirements from > different workers launched on the same host). > > The proposal is to update the document to deprecate the support of system > environment `SPARK_WORKER_INSTANCES` in Spark 3.0, and remove the support > in the next major version (Spark 3.1). > > Please kindly let me know if you have use cases relying on this feature. > When deploying spark on batch systems (by wrapping the standalone deployment in scripts that can be consumed by the batch scheduler), we typically end up with >1 worker per host. If I understand correctly, this proposal would make our use case unsupported. Thanks, Andrew > Thanks! > > Xingbo > -- It's dark in this basement.