[ 
https://issues.apache.org/jira/browse/SPARK-46018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiejiankun updated SPARK-46018:
-------------------------------
    Description: 
Without adding the `SPARK_LOCAL_HOSTNAME` attribute to spark-env.sh It is 
possible to elect and create the driver normally.

The submitted command is:
{code:java}
bin/spark-submit --class org.apache.spark.examples.SparkPi --master 
spark://XXX:7077 --deploy-mode cluster 
/opt/module/spark3.1/examples/jars/spark-examples_2.12-3.1.3.jar 1000{code}
The command to start the driver uses the real ip of the worker:
{code:java}
"spark://wor...@169.xxx.xxx.211:7078" {code}
The driver can run on any worker.

 

After adding the SPARK_LOCAL_HOSTNAME attribute to each worker, *the driver 
cannot run on workers other than the one that submitted the command.*

The command to start the driver uses the hostname of the worker:
{code:java}
"spark://Worker@hostname:7078" {code}
The error message is:
{code:java}
Launch Command: "/opt/module/jdk1.8.0_371/bin/java" "-cp" 
"/opt/module/spark3.1/conf/:/opt/module/spark3.1/jars/*:/opt/module/hadoop-3.3.0/etc/hadoop/"
 "-Xmx6144M" "-Dspark.eventLog.enabled=true" "-Dspark.driver.cores=4" 
"-Dspark.jars=file:/opt/module/spark3.1/examples/jars/spark-examples_2.12-3.1.3.jar"
 "-Dspark.submit.deployMode=cluster" "-Dspark.sql.shuffle.partitions=60" 
"-Dspark.master=spark://nodeA:7077" "-Dspark.executor.cores=4" 
"-Dspark.driver.supervise=false" 
"-Dspark.app.name=org.apache.spark.examples.SparkPi" "-Dspark.driver.memory=6g" 
"-Dspark.eventLog.compress=true" "-Dspark.executor.memory=8g" 
"-Dspark.submit.pyFiles=" "-Dspark.eventLog.dir=hdfs://nodeA:8020/sparklog/" 
"-Dspark.rpc.askTimeout=10s" "-Dspark.default.parallelism=60" 
"-Dspark.history.fs.cleaner.enabled=false" 
"org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@nodeB:7078" 
"/opt/module/spark3.1/work/driver-20231121105901-0000/spark-examples_2.12-3.1.3.jar"
 "org.apache.spark.examples.SparkPi" "1000"
========================================

23/11/21 10:59:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
23/11/21 10:59:02 INFO SecurityManager: Changing view acls to: jaken
23/11/21 10:59:02 INFO SecurityManager: Changing modify acls to: jaken
23/11/21 10:59:02 INFO SecurityManager: Changing view acls groups to: 
23/11/21 10:59:02 INFO SecurityManager: Changing modify acls groups to: 
23/11/21 10:59:02 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(jaken); groups 
with view permissions: Set(); users  with modify permissions: Set(jaken); 
groups with modify permissions: Set()
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
Exception in thread "main" java.net.BindException:Unable to specify the 
requested address: Service 'Driver' failed after 16 retries (on a random free 
port)! Consider explicitly setting the appropriate binding address for the 
service 'Driver' (for example spark.driver.bindAddress for SparkDriver) to the 
correct binding address.
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:438)
        at sun.nio.ch.Net.bind(Net.java:430)
        at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:225)
        at 
io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:134)
        at 
io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:550)
        at 
io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506)
        at 
io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491)
        at 
io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973)
        at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:248)
        at 
io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:356)
        at 
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:750) {code}
I ruled out the driver port as it reported the same error no matter how the 
port was set. I used the `SPARK_LOCAL_HOSTNAME` attribute to avoid the 
`Locality Level is ANY` problem, see [ SPARK-10149 | 
https://issues.apache.org/jira/browse/SPARK-10149 ]

  was:
Without adding the `SPARK_LOCAL_HOSTNAME` attribute to spark-env.sh It is 
possible to elect and create the driver normally.

The submitted command is:
{code:java}
bin/spark-submit --class org.apache.spark.examples.SparkPi --master 
spark://XXX:7077 --deploy-mode cluster 
/opt/module/spark3.1/examples/jars/spark-examples_2.12-3.1.3.jar 1000{code}
The command to start the driver uses the real ip of the worker:
{code:java}
"spark://wor...@169.xxx.xxx.211:7078" {code}
The driver can run on any worker.

 

After adding the SPARK_LOCAL_HOSTNAME attribute to each worker, *the driver 
cannot run on workers other than the one that submitted the command.*

The command to start the driver uses the hostname of the worker:
{code:java}
"spark://Worker@hostname:7078" {code}
The error message is:
{code:java}
Launch Command: "/opt/module/jdk1.8.0_371/bin/java" "-cp" 
"/opt/module/spark3.1/conf/:/opt/module/spark3.1/jars/*:/opt/module/hadoop-3.3.0/etc/hadoop/"
 "-Xmx6144M" "-Dspark.eventLog.enabled=true" "-Dspark.driver.cores=4" 
"-Dspark.jars=file:/opt/module/spark3.1/examples/jars/spark-examples_2.12-3.1.3.jar"
 "-Dspark.submit.deployMode=cluster" "-Dspark.sql.shuffle.partitions=60" 
"-Dspark.master=spark://xy04:7077" "-Dspark.executor.cores=4" 
"-Dspark.driver.supervise=false" 
"-Dspark.app.name=org.apache.spark.examples.SparkPi" "-Dspark.driver.memory=6g" 
"-Dspark.eventLog.compress=true" "-Dspark.executor.memory=8g" 
"-Dspark.submit.pyFiles=" "-Dspark.eventLog.dir=hdfs://xy04:8020/sparklog/" 
"-Dspark.rpc.askTimeout=10s" "-Dspark.default.parallelism=60" 
"-Dspark.history.fs.cleaner.enabled=false" 
"org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@xy04:7078" 
"/opt/module/spark3.1/work/driver-20231121105901-0000/spark-examples_2.12-3.1.3.jar"
 "org.apache.spark.examples.SparkPi" "1000"
========================================

23/11/21 10:59:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
23/11/21 10:59:02 INFO SecurityManager: Changing view acls to: jaken
23/11/21 10:59:02 INFO SecurityManager: Changing modify acls to: jaken
23/11/21 10:59:02 INFO SecurityManager: Changing view acls groups to: 
23/11/21 10:59:02 INFO SecurityManager: Changing modify acls groups to: 
23/11/21 10:59:02 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(jaken); groups 
with view permissions: Set(); users  with modify permissions: Set(jaken); 
groups with modify permissions: Set()
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free 
port. You may check whether configuring an appropriate binding address.
Exception in thread "main" java.net.BindException:Unable to specify the 
requested address: Service 'Driver' failed after 16 retries (on a random free 
port)! Consider explicitly setting the appropriate binding address for the 
service 'Driver' (for example spark.driver.bindAddress for SparkDriver) to the 
correct binding address.
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:438)
        at sun.nio.ch.Net.bind(Net.java:430)
        at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:225)
        at 
io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:134)
        at 
io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:550)
        at 
io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506)
        at 
io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491)
        at 
io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973)
        at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:248)
        at 
io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:356)
        at 
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:750) {code}
I ruled out the driver port as it reported the same error no matter how the 
port was set. I used the `SPARK_LOCAL_HOSTNAME` attribute to avoid the 
`Locality Level is ANY` problem, see [ SPARK-10149 | 
https://issues.apache.org/jira/browse/SPARK-10149 ]


> driver fails to start properly in standalone cluster deployment mode
> --------------------------------------------------------------------
>
>                 Key: SPARK-46018
>                 URL: https://issues.apache.org/jira/browse/SPARK-46018
>             Project: Spark
>          Issue Type: Bug
>          Components: Deploy, Spark Submit
>    Affects Versions: 3.1.3
>            Reporter: xiejiankun
>            Priority: Major
>
> Without adding the `SPARK_LOCAL_HOSTNAME` attribute to spark-env.sh It is 
> possible to elect and create the driver normally.
> The submitted command is:
> {code:java}
> bin/spark-submit --class org.apache.spark.examples.SparkPi --master 
> spark://XXX:7077 --deploy-mode cluster 
> /opt/module/spark3.1/examples/jars/spark-examples_2.12-3.1.3.jar 1000{code}
> The command to start the driver uses the real ip of the worker:
> {code:java}
> "spark://wor...@169.xxx.xxx.211:7078" {code}
> The driver can run on any worker.
>  
> After adding the SPARK_LOCAL_HOSTNAME attribute to each worker, *the driver 
> cannot run on workers other than the one that submitted the command.*
> The command to start the driver uses the hostname of the worker:
> {code:java}
> "spark://Worker@hostname:7078" {code}
> The error message is:
> {code:java}
> Launch Command: "/opt/module/jdk1.8.0_371/bin/java" "-cp" 
> "/opt/module/spark3.1/conf/:/opt/module/spark3.1/jars/*:/opt/module/hadoop-3.3.0/etc/hadoop/"
>  "-Xmx6144M" "-Dspark.eventLog.enabled=true" "-Dspark.driver.cores=4" 
> "-Dspark.jars=file:/opt/module/spark3.1/examples/jars/spark-examples_2.12-3.1.3.jar"
>  "-Dspark.submit.deployMode=cluster" "-Dspark.sql.shuffle.partitions=60" 
> "-Dspark.master=spark://nodeA:7077" "-Dspark.executor.cores=4" 
> "-Dspark.driver.supervise=false" 
> "-Dspark.app.name=org.apache.spark.examples.SparkPi" 
> "-Dspark.driver.memory=6g" "-Dspark.eventLog.compress=true" 
> "-Dspark.executor.memory=8g" "-Dspark.submit.pyFiles=" 
> "-Dspark.eventLog.dir=hdfs://nodeA:8020/sparklog/" 
> "-Dspark.rpc.askTimeout=10s" "-Dspark.default.parallelism=60" 
> "-Dspark.history.fs.cleaner.enabled=false" 
> "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@nodeB:7078" 
> "/opt/module/spark3.1/work/driver-20231121105901-0000/spark-examples_2.12-3.1.3.jar"
>  "org.apache.spark.examples.SparkPi" "1000"
> ========================================
> 23/11/21 10:59:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 23/11/21 10:59:02 INFO SecurityManager: Changing view acls to: jaken
> 23/11/21 10:59:02 INFO SecurityManager: Changing modify acls to: jaken
> 23/11/21 10:59:02 INFO SecurityManager: Changing view acls groups to: 
> 23/11/21 10:59:02 INFO SecurityManager: Changing modify acls groups to: 
> 23/11/21 10:59:02 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: Set(jaken); groups 
> with view permissions: Set(); users  with modify permissions: Set(jaken); 
> groups with modify permissions: Set()
> 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random 
> free port. You may check whether configuring an appropriate binding address.
> 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random 
> free port. You may check whether configuring an appropriate binding address.
> 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random 
> free port. You may check whether configuring an appropriate binding address.
> 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random 
> free port. You may check whether configuring an appropriate binding address.
> 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random 
> free port. You may check whether configuring an appropriate binding address.
> 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random 
> free port. You may check whether configuring an appropriate binding address.
> 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random 
> free port. You may check whether configuring an appropriate binding address.
> 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random 
> free port. You may check whether configuring an appropriate binding address.
> 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random 
> free port. You may check whether configuring an appropriate binding address.
> 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random 
> free port. You may check whether configuring an appropriate binding address.
> 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random 
> free port. You may check whether configuring an appropriate binding address.
> 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random 
> free port. You may check whether configuring an appropriate binding address.
> 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random 
> free port. You may check whether configuring an appropriate binding address.
> 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random 
> free port. You may check whether configuring an appropriate binding address.
> 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random 
> free port. You may check whether configuring an appropriate binding address.
> 23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random 
> free port. You may check whether configuring an appropriate binding address.
> Exception in thread "main" java.net.BindException:Unable to specify the 
> requested address: Service 'Driver' failed after 16 retries (on a random free 
> port)! Consider explicitly setting the appropriate binding address for the 
> service 'Driver' (for example spark.driver.bindAddress for SparkDriver) to 
> the correct binding address.
>       at sun.nio.ch.Net.bind0(Native Method)
>       at sun.nio.ch.Net.bind(Net.java:438)
>       at sun.nio.ch.Net.bind(Net.java:430)
>       at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:225)
>       at 
> io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:134)
>       at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:550)
>       at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491)
>       at 
> io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973)
>       at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:248)
>       at 
> io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:356)
>       at 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>       at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
>       at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>       at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>       at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>       at java.lang.Thread.run(Thread.java:750) {code}
> I ruled out the driver port as it reported the same error no matter how the 
> port was set. I used the `SPARK_LOCAL_HOSTNAME` attribute to avoid the 
> `Locality Level is ANY` problem, see [ SPARK-10149 | 
> https://issues.apache.org/jira/browse/SPARK-10149 ]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to