I’ve been trying to set up monitoring for our Spark 3.0.1 cluster running in 
K8s. We are using Prometheus as our monitoring system. We require both executor 
and driver metrics. My initial approach was to use the following configuration, 
to expose both  metrics on the Spark UI:

{
    'spark.ui.prometheus.enabled': ‘true’
}

I was able to scrape http://<driver_hostname>:4040/metrics/prometheus/ for 
driver and http://<driver_hostname>:4040/metrics/executors/prometheus/ for 
executor metrics. However, the executor metrics only contain those defined 
here: https://spark.apache.org/docs/latest/monitoring.html#executor-metrics 
<https://spark.apache.org/docs/latest/monitoring.html#executor-metrics>, which 
is referred to as ExecutorSummary. However, I would like to get all metrics 
from the Executor instance metric system: 
https://spark.apache.org/docs/latest/monitoring.html#component-instance--executor
 
<https://spark.apache.org/docs/latest/monitoring.html#component-instance--executor>.

I am not sure if these are available on the driver at all, so I’ve been 
thinking of directly scraping the executors instead. It seems PrometheusServlet 
is meant for this purpose, however the executors aren't running web servers. I 
also don’t seem to find a configuration setting to open up a port on the 
executor container, so that it can be scraped. So the thing I have in my mind 
right now is writing a custom sink that exports the metrics in the Prometheus 
format to a local file, and running a sidecar container with a nginx that 
serves that static file. In turn the nginx endpoint can be scraped by 
Prometheus. Am I overcomplicating this? Is there a simpler approach?

Thanks,
David Szakallas

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to