[ https://issues.apache.org/jira/browse/BEAM-7528?focusedWorklogId=278193&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278193 ]
ASF GitHub Bot logged work on BEAM-7528: ---------------------------------------- Author: ASF GitHub Bot Created on: 17/Jul/19 11:54 Start Date: 17/Jul/19 11:54 Worklog Time Spent: 10m Work Description: kkucharc commented on pull request #8941: [BEAM-7528] Save load test metrics according to distribution name URL: https://github.com/apache/beam/pull/8941#discussion_r304360938 ########## File path: sdks/python/apache_beam/testing/load_tests/load_test_metrics_utils.py ########## @@ -138,8 +143,25 @@ def as_dict(self): class CounterMetric(Metric): def __init__(self, counter_dict, submit_timestamp, metric_id): super(CounterMetric, self).__init__(submit_timestamp, metric_id) - self.value = counter_dict.committed self.label = str(counter_dict.key.metric.name) + self.value = counter_dict.committed + + +class DistributionMetrics(Metric): Review comment: I understand your point, calculating one value might "save" some space in BigQuery. Unfortunately distribution metrics doesn't support with median or other percentiles out of the box (maybe it's a good idea for feature request?) and I am not sure if it's good idea to introduce such logic in util for reading metrics. Mainly because in the situation when we would like save some external (not expected in MetricsReader) metrics which we don't know what mean (but still may be valuable for the others ex. tfx team) calculating median or any aggregates may cause loosing meaningful info (maybe someone who defined those external metrics needs max, min, sum or mean). Do you agree? As it comes to time windows, also distribution doesn't store such thing. It is possible to retrieve the step name in which metrics would be collected. And this I can add to metric label we save. This is confusing problem. To sum everything up: - What to do with metrics that weren't collected in our pipeline but somewhere deeper? It is decided to save them. But I can suggest here that it would help to have pipeline option called `save_external_metrics`. WDYT? - In what shape we should save metrics that we don't know nothing about them? IMO probably as raw as possible, because we don't know what is useful for user that collects them. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 278193) Time Spent: 5h 20m (was: 5h 10m) > Save correctly Python Load Tests metrics according to it's namespace > -------------------------------------------------------------------- > > Key: BEAM-7528 > URL: https://issues.apache.org/jira/browse/BEAM-7528 > Project: Beam > Issue Type: Bug > Components: testing > Reporter: Kasia Kucharczyk > Assignee: Kasia Kucharczyk > Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > Bug discovered when metrics monitored more than one distribution and saved > all as `runtime`. -- This message was sent by Atlassian JIRA (v7.6.14#76016)