Hi,

With jmx_exporter <https://github.com/prometheus/jmx_exporter> and
Prometheus you can always re-write the metrics patterns on the fly. Btw if
you use Grafana its easy to filter things even without the re-write.
If this is a custom dashboard you can always group metrics based on the
spark.app.id as a prefix, no? Also I think some times its good to know if
some executor
failed and why and report specific execution metrics. For example if you
have skewed data and that caused jvm issues etc.

Stavros
On Mon, May 6, 2019 at 11:29 PM Anton Kirillov <akiril...@mesosphere.com>
wrote:

> Hi everyone!
>
> We are currently working on building a unified monitoring/alerting
> solution for Spark and would like to rely on Spark's own metrics to avoid
> divergence from the upstream. One of the challenges is to support metrics
> coming from multiple Spark applications running on a cluster: scheduled
> jobs, long-running streaming applications etc.
>
> Original problem:
> Spark assigns metrics names using *spark.app.id <http://spark.app.id>*
> and *spark.executor.id <http://spark.executor.id>* as a part of them.
> Thus the number of metrics is continuously growing because those IDs are
> unique between executions whereas the metrics themselves report the same
> thing. Another issue which arises here is how to use constantly changing
> metric names in dashboards.
>
> For example, *jvm_heap_used* reported by all Spark instances (components):
> - <spark.app.id>_driver_jvm_heap_used (Driver)
> - <spark.app.id>_<spark.executor.id>_jvm_heap_used (Executors)
>
> While *spark.app.id <http://spark.app.id>* can be overridden with
> *spark.metrics.namespace*, there's no such an option for *spark.executor.id
> <http://spark.executor.id>* which makes it impossible to build a reusable
> dashboard because (given the uniqueness of IDs) differently named metrics
> are emitted for each execution.
>
> One of the possible solutions would be to make executor metrics names
> follow the driver's metrics name pattern, e.g.:
> - <spark.app.id>_driver_jvm_heap_used (Driver)
> - <spark.app.id>_executor_jvm_heap_used (Executors)
>
> and distinguish executors based on tags (tags should be configured in
> metric reporters in this case). Not sure if this could potentially break
> Driver UI though.
>
> I'd really appreciate any feedback on this issue and would be happy to
> create a Jira issue/PR if this change looks sane for the community.
>
> Thanks in advance.
>
> --
> *Anton Kirillov*
> Senior Software Engineer, Mesosphere
>

Reply via email to