On Tue, Dec 15, 2020 at 11:25 AM Roy Smith <r...@panix.com> wrote:
>
> Thanks.  Backing up a step, what I'm looking to do is build some kind of 
> performance and monitoring dashboard for my tool.  From what you say, maybe 
> Thanos is not the right thing for that?

Thanos is an aggregating data store for the Prometheus metrics that we
collect in the Wikimedia production network. We do not ship any
metrics for Cloud VPS or Toolforge into that environment.

We have some metrics available for Toolforge tools, but not as many as
we would like. The best monitoring we have currently is for the
Toolforge Kubernetes cluster and the workloads that run there. The
k8s-status tool shows read-only information about the Toolforge
Kubernetes cluster. At
<https://k8s-status.toolforge.org/namespaces/tool-slow-parse/> you can
see information about Roy's slow-parse tool. From there you can follow
the 'Grafana dashboard' link to a Grafana dashboard that shows
collected metrics about the Pods that have run in the slow-parse
tool's Kubernetes namespace.

Somedayâ„¢ we will make time to build out more monitoring for both
Toolforge tools and Cloud VPS tenants. There are several Phabricator
tasks in the extended backlog with wishes that folks have made about
such things. <https://phabricator.wikimedia.org/T194333> is one that
has some really high level ideas on it and some more concrete
subtasks.

Bryan
-- 
Bryan Davis              Technical Engagement      Wikimedia Foundation
Principal Software Engineer                               Boise, ID USA
[[m:User:BDavis_(WMF)]]                                      irc: bd808

_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to