On Tue, Dec 15, 2020 at 11:25 AM Roy Smith <r...@panix.com> wrote: > > Thanks. Backing up a step, what I'm looking to do is build some kind of > performance and monitoring dashboard for my tool. From what you say, maybe > Thanos is not the right thing for that?
Thanos is an aggregating data store for the Prometheus metrics that we collect in the Wikimedia production network. We do not ship any metrics for Cloud VPS or Toolforge into that environment. We have some metrics available for Toolforge tools, but not as many as we would like. The best monitoring we have currently is for the Toolforge Kubernetes cluster and the workloads that run there. The k8s-status tool shows read-only information about the Toolforge Kubernetes cluster. At <https://k8s-status.toolforge.org/namespaces/tool-slow-parse/> you can see information about Roy's slow-parse tool. From there you can follow the 'Grafana dashboard' link to a Grafana dashboard that shows collected metrics about the Pods that have run in the slow-parse tool's Kubernetes namespace. Somedayâ„¢ we will make time to build out more monitoring for both Toolforge tools and Cloud VPS tenants. There are several Phabricator tasks in the extended backlog with wishes that folks have made about such things. <https://phabricator.wikimedia.org/T194333> is one that has some really high level ideas on it and some more concrete subtasks. Bryan -- Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808 _______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud