Hmm, I’m not asking about using k8s to control Spark as a Job manager or
scheduler like Yarn. We use the built-in standalone Spark Job Manager and
sparl://spark-api:7077 as the master not k8s.
The problem is using k8s to manage a cluster consisting of our app, some
databases, and Spark (one master, one driver, several executors). The problem
is that some kind of callback from Spark is trying to use the pod ID in the
callback and is failing to connect because of that. We have tried deployMode
“client” and “cluster” but get the same error
The full trace is below but the important bit is:
Failed to connect to harness-64d97d6d6-6n7nh:46337
This came from the deployMode = “client: and the port is the driver port, which
should be on the launching pod. For some reason it is using a pod ID instead of
a real address. Doesn’t the driver run in the launching app’s process? The
launching app is on the pod ID harness-64d97d6d6-6n7nh but it has the k8s DNS
address of harness-api. I can see the correct address fro the launching pod
with "kubectl get services"
The error is:
Spark Executor Command: "/usr/lib/jvm/java-1.8-openjdk/bin/java" "-cp"
"/spark/conf/:/spark/jars/*:/etc/hadoop/" "-Xmx1024M"
"-Dspark.driver.port=46337"
"org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url"
"spark://CoarseGrainedScheduler@harness-64d97d6d6-6n7nh:46337" "--executor-id"
"138" "--hostname" "10.31.31.174" "--cores" "8" "--app-id"
"app-20190213210105-" "--worker-url" "spark://Worker@10.31.31.174:37609"
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:63)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:293)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:63)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
... 4 more
Caused by: java.io.IOException: Failed to connect to
harness-64d97d6d6-6n7nh:46337
at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at
org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: harness-64d97d6d6-6n7nh
at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at java.net.InetAddress.getByName(InetAddress.java:1077)
at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:146)
at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:143)
at java.security.AccessController.doPrivileged(Native Method)
at
io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:143)
at
io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:43)
at
io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:63)
at
io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:55)
at
io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:57)
at