Hi, I can run successfully in cluster mode but when trying in client mode the job won't run.
I have a driver pod created from jupyter/pyspark-notebook that I would assume is sufficient. I execute: bin/spark-submit \ --master k8s://https://192.168.99.100:8443 \ --deploy-mode client \ --conf spark.executor.instances=1 \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.kubernetes.container.image=spark:spark-docker \ --conf spark.driver.host=jupyter-pyspark.default.svc.cluster.local \ --conf spark.driver.port=9888 \ --class org.apache.spark.examples.SparkPi \ --name spark-pi \ local:///opt/spark/examples/jars/spark-examples_2.11-2.4.3.jar 10 I am then looking at the created executor pod. All seems well, it can connect back to the driver pod on 9888 but then: 19/05/26 11:35:38 ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks java.io.IOException: Connecting to jupyter-pyspark.default.svc.cluster.local/10.111.230.250:37615 timed out (120000 ms) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:243) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187) at org.apache.spark.network.netty.NettyBlockTransferService$$anon$2.createAndStart(NettyBlockTransferService.scala:114) at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141) at org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:121) at org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:124) at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:98) at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:757) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:162) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:151) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:151) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.broadcast.TorrentBroadcast.org $apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:151) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:231) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) What does this mean? Thanks in advance David