Having trouble collecting metrics from tservers

Scott Reynolds Mon, 02 Jul 2018 15:20:02 -0700

List,

Struggling with collecting metrics from the tserver. We are attempting to
pull down per tablet,
allowed_metrics = [
    "rows_inserted",
    "rows_upserted",
    "rows_deleted",
    "scanner_rows_scanned",
    "upserts_as_updates",
    "rows_updated",
    "insertations_failed_dup_key"
]
by querying the metrics json endpoint.


We see the metrics being sent and they appear to be the same value.

I have the following questions:
1. Our metrics unique per `id` in the json payload ?
2. How do other collect metrics for their clusters?

Any help would be appreciated thanks !

Our while loop looks like this:
while not self.shutdown_event.is_set():
            try:
                collection_time = time()
                http_response = requests.get("%s://localhost:%s/metrics" % (
                                          self.protocol, self.port,),
verify=False)
                for metric_type in http_response.json():
                    metric_prefix = metric_type['type']
                    for metric in metric_type['metrics']:
                        if metric["name"] not in allowed_metrics:
                            continue
                        full_name = metric_prefix + "." + metric["name"]
                        for key, value in metric.items():
                            if key == "name":
                                continue
                            log.info("%s_%s -> %s" % (full_name, key,
value,))
                            try:
                                point = float(value)
                                tags = metric_type['attributes'].copy()
                                tags['id'] = metric_type['id']
                                self.metrics_client.gauge(
                                    "%s_%s" % (full_name, key,),
                                    point,
                                    timestamp=collection_time,
                                    tags=tags)
                            except ValueError as not_a_number:
                                log.info("%s is not a number. Not sending",
                                         value)
                self.metrics_client.flush(timestamp=collection_time)
            except Exception as ex:
                log.error("Failed to parse kudu metrics", ex)
            log.info("Pausing for 10 seconds after processing metrics")
            self.shutdown_event.wait(10)

Having trouble collecting metrics from tservers

Reply via email to