[ https://issues.apache.org/jira/browse/SPARK-46018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
xiejiankun updated SPARK-46018: ------------------------------- Description: Without adding the `SPARK_LOCAL_HOSTNAME` attribute to spark-env.sh It is possible to elect and create the driver normally. The submitted command is: {code:java} bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://XXX:7077 --deploy-mode cluster /opt/module/spark3.1/examples/jars/spark-examples_2.12-3.1.3.jar 1000{code} The command to start the driver uses the real ip of the worker: {code:java} "spark://wor...@169.xxx.xxx.211:7078" {code} The driver can run on any worker. After adding the SPARK_LOCAL_HOSTNAME attribute to each worker, *the driver cannot run on workers other than the one that submitted the command.* The command to start the driver uses the hostname of the worker: {code:java} "spark://Worker@hostname:7078" {code} The error message is: {code:java} Launch Command: "/opt/module/jdk1.8.0_371/bin/java" "-cp" "/opt/module/spark3.1/conf/:/opt/module/spark3.1/jars/*:/opt/module/hadoop-3.3.0/etc/hadoop/" "-Xmx6144M" "-Dspark.eventLog.enabled=true" "-Dspark.driver.cores=4" "-Dspark.jars=file:/opt/module/spark3.1/examples/jars/spark-examples_2.12-3.1.3.jar" "-Dspark.submit.deployMode=cluster" "-Dspark.sql.shuffle.partitions=60" "-Dspark.master=spark://nodeA:7077" "-Dspark.executor.cores=4" "-Dspark.driver.supervise=false" "-Dspark.app.name=org.apache.spark.examples.SparkPi" "-Dspark.driver.memory=6g" "-Dspark.eventLog.compress=true" "-Dspark.executor.memory=8g" "-Dspark.submit.pyFiles=" "-Dspark.eventLog.dir=hdfs://nodeA:8020/sparklog/" "-Dspark.rpc.askTimeout=10s" "-Dspark.default.parallelism=60" "-Dspark.history.fs.cleaner.enabled=false" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@nodeB:7078" "/opt/module/spark3.1/work/driver-20231121105901-0000/spark-examples_2.12-3.1.3.jar" "org.apache.spark.examples.SparkPi" "1000" ======================================== 23/11/21 10:59:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 23/11/21 10:59:02 INFO SecurityManager: Changing view acls to: jaken 23/11/21 10:59:02 INFO SecurityManager: Changing modify acls to: jaken 23/11/21 10:59:02 INFO SecurityManager: Changing view acls groups to: 23/11/21 10:59:02 INFO SecurityManager: Changing modify acls groups to: 23/11/21 10:59:02 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jaken); groups with view permissions: Set(); users with modify permissions: Set(jaken); groups with modify permissions: Set() 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. Exception in thread "main" java.net.BindException:Unable to specify the requested address: Service 'Driver' failed after 16 retries (on a random free port)! Consider explicitly setting the appropriate binding address for the service 'Driver' (for example spark.driver.bindAddress for SparkDriver) to the correct binding address. at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:438) at sun.nio.ch.Net.bind(Net.java:430) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:225) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:134) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:550) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:248) at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:356) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:750) {code} I ruled out the driver port as it reported the same error no matter how the port was set. I used the `SPARK_LOCAL_HOSTNAME` attribute to avoid the `Locality Level is ANY` problem, see [ SPARK-10149 | https://issues.apache.org/jira/browse/SPARK-10149 ] was: Without adding the `SPARK_LOCAL_HOSTNAME` attribute to spark-env.sh It is possible to elect and create the driver normally. The submitted command is: {code:java} bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://XXX:7077 --deploy-mode cluster /opt/module/spark3.1/examples/jars/spark-examples_2.12-3.1.3.jar 1000{code} The command to start the driver uses the real ip of the worker: {code:java} "spark://wor...@169.xxx.xxx.211:7078" {code} The driver can run on any worker. After adding the SPARK_LOCAL_HOSTNAME attribute to each worker, *the driver cannot run on workers other than the one that submitted the command.* The command to start the driver uses the hostname of the worker: {code:java} "spark://Worker@hostname:7078" {code} The error message is: {code:java} Launch Command: "/opt/module/jdk1.8.0_371/bin/java" "-cp" "/opt/module/spark3.1/conf/:/opt/module/spark3.1/jars/*:/opt/module/hadoop-3.3.0/etc/hadoop/" "-Xmx6144M" "-Dspark.eventLog.enabled=true" "-Dspark.driver.cores=4" "-Dspark.jars=file:/opt/module/spark3.1/examples/jars/spark-examples_2.12-3.1.3.jar" "-Dspark.submit.deployMode=cluster" "-Dspark.sql.shuffle.partitions=60" "-Dspark.master=spark://xy04:7077" "-Dspark.executor.cores=4" "-Dspark.driver.supervise=false" "-Dspark.app.name=org.apache.spark.examples.SparkPi" "-Dspark.driver.memory=6g" "-Dspark.eventLog.compress=true" "-Dspark.executor.memory=8g" "-Dspark.submit.pyFiles=" "-Dspark.eventLog.dir=hdfs://xy04:8020/sparklog/" "-Dspark.rpc.askTimeout=10s" "-Dspark.default.parallelism=60" "-Dspark.history.fs.cleaner.enabled=false" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@xy04:7078" "/opt/module/spark3.1/work/driver-20231121105901-0000/spark-examples_2.12-3.1.3.jar" "org.apache.spark.examples.SparkPi" "1000" ======================================== 23/11/21 10:59:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 23/11/21 10:59:02 INFO SecurityManager: Changing view acls to: jaken 23/11/21 10:59:02 INFO SecurityManager: Changing modify acls to: jaken 23/11/21 10:59:02 INFO SecurityManager: Changing view acls groups to: 23/11/21 10:59:02 INFO SecurityManager: Changing modify acls groups to: 23/11/21 10:59:02 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jaken); groups with view permissions: Set(); users with modify permissions: Set(jaken); groups with modify permissions: Set() 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free port. You may check whether configuring an appropriate binding address. Exception in thread "main" java.net.BindException:Unable to specify the requested address: Service 'Driver' failed after 16 retries (on a random free port)! Consider explicitly setting the appropriate binding address for the service 'Driver' (for example spark.driver.bindAddress for SparkDriver) to the correct binding address. at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:438) at sun.nio.ch.Net.bind(Net.java:430) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:225) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:134) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:550) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:248) at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:356) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:750) {code} I ruled out the driver port as it reported the same error no matter how the port was set. I used the `SPARK_LOCAL_HOSTNAME` attribute to avoid the `Locality Level is ANY` problem, see [ SPARK-10149 | https://issues.apache.org/jira/browse/SPARK-10149 ] > driver fails to start properly in standalone cluster deployment mode > -------------------------------------------------------------------- > > Key: SPARK-46018 > URL: https://issues.apache.org/jira/browse/SPARK-46018 > Project: Spark > Issue Type: Bug > Components: Deploy, Spark Submit > Affects Versions: 3.1.3 > Reporter: xiejiankun > Priority: Major > > Without adding the `SPARK_LOCAL_HOSTNAME` attribute to spark-env.sh It is > possible to elect and create the driver normally. > The submitted command is: > {code:java} > bin/spark-submit --class org.apache.spark.examples.SparkPi --master > spark://XXX:7077 --deploy-mode cluster > /opt/module/spark3.1/examples/jars/spark-examples_2.12-3.1.3.jar 1000{code} > The command to start the driver uses the real ip of the worker: > {code:java} > "spark://wor...@169.xxx.xxx.211:7078" {code} > The driver can run on any worker. > > After adding the SPARK_LOCAL_HOSTNAME attribute to each worker, *the driver > cannot run on workers other than the one that submitted the command.* > The command to start the driver uses the hostname of the worker: > {code:java} > "spark://Worker@hostname:7078" {code} > The error message is: > {code:java} > Launch Command: "/opt/module/jdk1.8.0_371/bin/java" "-cp" > "/opt/module/spark3.1/conf/:/opt/module/spark3.1/jars/*:/opt/module/hadoop-3.3.0/etc/hadoop/" > "-Xmx6144M" "-Dspark.eventLog.enabled=true" "-Dspark.driver.cores=4" > "-Dspark.jars=file:/opt/module/spark3.1/examples/jars/spark-examples_2.12-3.1.3.jar" > "-Dspark.submit.deployMode=cluster" "-Dspark.sql.shuffle.partitions=60" > "-Dspark.master=spark://nodeA:7077" "-Dspark.executor.cores=4" > "-Dspark.driver.supervise=false" > "-Dspark.app.name=org.apache.spark.examples.SparkPi" > "-Dspark.driver.memory=6g" "-Dspark.eventLog.compress=true" > "-Dspark.executor.memory=8g" "-Dspark.submit.pyFiles=" > "-Dspark.eventLog.dir=hdfs://nodeA:8020/sparklog/" > "-Dspark.rpc.askTimeout=10s" "-Dspark.default.parallelism=60" > "-Dspark.history.fs.cleaner.enabled=false" > "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@nodeB:7078" > "/opt/module/spark3.1/work/driver-20231121105901-0000/spark-examples_2.12-3.1.3.jar" > "org.apache.spark.examples.SparkPi" "1000" > ======================================== > 23/11/21 10:59:02 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 23/11/21 10:59:02 INFO SecurityManager: Changing view acls to: jaken > 23/11/21 10:59:02 INFO SecurityManager: Changing modify acls to: jaken > 23/11/21 10:59:02 INFO SecurityManager: Changing view acls groups to: > 23/11/21 10:59:02 INFO SecurityManager: Changing modify acls groups to: > 23/11/21 10:59:02 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(jaken); groups > with view permissions: Set(); users with modify permissions: Set(jaken); > groups with modify permissions: Set() > 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random > free port. You may check whether configuring an appropriate binding address. > 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random > free port. You may check whether configuring an appropriate binding address. > 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random > free port. You may check whether configuring an appropriate binding address. > 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random > free port. You may check whether configuring an appropriate binding address. > 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random > free port. You may check whether configuring an appropriate binding address. > 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random > free port. You may check whether configuring an appropriate binding address. > 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random > free port. You may check whether configuring an appropriate binding address. > 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random > free port. You may check whether configuring an appropriate binding address. > 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random > free port. You may check whether configuring an appropriate binding address. > 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random > free port. You may check whether configuring an appropriate binding address. > 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random > free port. You may check whether configuring an appropriate binding address. > 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random > free port. You may check whether configuring an appropriate binding address. > 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random > free port. You may check whether configuring an appropriate binding address. > 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random > free port. You may check whether configuring an appropriate binding address. > 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random > free port. You may check whether configuring an appropriate binding address. > 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random > free port. You may check whether configuring an appropriate binding address. > Exception in thread "main" java.net.BindException:Unable to specify the > requested address: Service 'Driver' failed after 16 retries (on a random free > port)! Consider explicitly setting the appropriate binding address for the > service 'Driver' (for example spark.driver.bindAddress for SparkDriver) to > the correct binding address. > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:438) > at sun.nio.ch.Net.bind(Net.java:430) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:225) > at > io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:134) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:550) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334) > at > io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506) > at > io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491) > at > io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973) > at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:248) > at > io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:356) > at > io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) > at > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > at > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:750) {code} > I ruled out the driver port as it reported the same error no matter how the > port was set. I used the `SPARK_LOCAL_HOSTNAME` attribute to avoid the > `Locality Level is ANY` problem, see [ SPARK-10149 | > https://issues.apache.org/jira/browse/SPARK-10149 ] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org