I discovered something while trying to update test_progress_metrics
<https://github.com/apache/beam/blob/5201fa91cbf40d1730e1b2fb62bcdb4bce5ca0eb/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py#L361>
in fn_api_runner_tests.py to inspect the returned MonitoringInfos in
addition to the already returned metrics format.

This metric appears to be added twice using the None tag (but overwrites a
previous one). I am not sure if its intentional or not. Please let me know
if this is intentionally overwriting what is supposed to be the same
metric, or if something might be wrong here.

See the use of element count metrics:

   1. Here <http://str(tag)] = receiver.opcounter.element_counter.value()>
   the metric is added using the self.tagged_receivers tag sin the DoOperation
   to add the element count metric. This can be the value 'None'
   2. Here
   
<https://github.com/apache/beam/blob/b532b38958527529bf561c92d34b1f1230213395/sdks/python/apache_beam/runners/worker/operations.py#L186>
   the ONLY_OUTPUT tag is used and overridden later.
      1. Then fix_output_tags
      
<https://github.com/apache/beam/blob/5201fa91cbf40d1730e1b2fb62bcdb4bce5ca0eb/sdks/python/apache_beam/runners/worker/bundle_processor.py#L328>
      in  bundle_processor.by assigns the tag, which in this case is None
      again

When the second instance of the metric is added it gets overwritten in
the output_element_counts (because it uses the same key). Is it intentional
to overwrite the metric?

I discovered that the metric was created twice, because I am not using a
map of tags I am just adding another entry when the metric is added as a
monitoring_info a second time.

So if this is intentional, then I need to make my code do the equivalent
thing, and check that there is already a MonitoringInfo for the metric and
update its value (or assert it is the same value).

Also, is it intentional to use None as a tag name here? Seems like an odd
choice.

Thanks
Alex

Reply via email to