From: Pat Ferrel <p...@actionml.com>
Reply: Pat Ferrel <p...@actionml.com>
Date: February 12, 2019 at 5:40:41 PM
To: user@spark.apache.org <user@spark.apache.org>
Subject:  Spark with Kubernetes connecting to pod id, not address  

We have a k8s deployment of several services including Apache Spark. All 
services seem to be operational. Our application connects to the Spark master 
to submit a job using the k8s DNS service for the cluster where the master is 
called `spark-api` so we use `master=spark://spark-api:7077` and we use 
`spark.submit.deployMode=cluster`. We submit the job through the API not by the 
spark-submit script. 

This will run the "driver" and all "executors" on the cluster and this part 
seems to work but there is a callback to the launching code in our app from 
some Spark process. For some reason it is trying to connect to 
`harness-64d97d6d6-4r4d8`, which is the **pod ID**, not the k8s cluster IP or 
DNS.

How could this **pod ID** be getting into the system? Spark somehow seems to 
think it is the address of the service that called it. Needless to say any 
connection to the k8s pod ID fails and so does the job.

Any idea how Spark could think the **pod ID** is an IP address or DNS name? 

BTW if we run a small sample job with `master=local` all is well, but the same 
job executed with the above config tries to connect to the spurious pod ID.

BTW2 the pod launching the Spark job has the k8s DNS name "harness-api” not 
sure if this matters

Thanks in advance

Reply via email to