Hi all,

I have a PR https://github.com/apache/spark/pull/22381 that exposes
application status
metrics (related jira: SPARK-25394).

So far metrics tooling needs to scrape the metrics rest api to get metrics
like job delay, stages failed, stages completed etc.
>From devops perspective it is good to standardize on a unified way of
gathering metrics.
The need came up on the K8s side where jmx prometheus exporter is commonly
used to scrape metrics for several components such as kafka, cassandra, but
the need is not limited there.

Check comment here
<https://github.com/apache/spark/pull/22381#issuecomment-420029771>:
"The rest api is great for UI and consolidated analytics, but monitoring
through it is not as straightforward as when the data emits directly from
the source like this. There is all kinds of nice context that we get when
the data from this spark node is collected directly from the node itself,
and not proxied through another collector / reporter. It is easier to build
a monitoring data model across the cluster when node, jmx, pod, resource
manifests, and spark data all align by virtue of coming from the same
collector. Building a similar view of the cluster just from the rest api,
as a comparison, is simply harder and quite challenging to do in general
purpose terms."

The PR is ok to be merged but the major concern here is the mirroring of
the metrics. I think that mirroring is ok since people may dont want to
check the ui and they just want to integrate with jmx only (my use case)
and gather metrics in grafana (common case out there).

Does any of the committers or the community have an opinion on this?
Is there an agreement about moving on with this? Note that the addition
does not change much and can always be refactored if we come up with a new
plan for the metrics story in the future.

Thanks,
Stavros

Reply via email to