[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=154064&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154064 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 12/Oct/18 23:58 Start Date: 12/Oct/18 23:58 Worklog Time Spent: 10m Work Description: pabloem closed pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/model/fn-execution/src/main/proto/beam_fn_api.proto b/model/fn-execution/src/main/proto/beam_fn_api.proto index a0795a7c285..915686de6b3 100644 --- a/model/fn-execution/src/main/proto/beam_fn_api.proto +++ b/model/fn-execution/src/main/proto/beam_fn_api.proto @@ -40,6 +40,7 @@ option java_outer_classname = "BeamFnApi"; import "beam_runner_api.proto"; import "endpoints.proto"; +import "google/protobuf/descriptor.proto"; import "google/protobuf/timestamp.proto"; import "google/protobuf/wrappers.proto"; @@ -250,11 +251,16 @@ message ProcessBundleRequest { message ProcessBundleResponse { // (Optional) If metrics reporting is supported by the SDK, this represents // the final metrics to record for this bundle. + // DEPRECATED Metrics metrics = 1; // (Optional) Specifies that the bundle has been split since the last // ProcessBundleProgressResponse was sent. BundleSplit split = 2; + + // (Required) The list of metrics or other MonitoredState + // collected while processing this bundle. + repeated MonitoringInfo monitoring_infos = 3; } // A request to report progress information for a given bundle. @@ -275,9 +281,9 @@ message MonitoringInfo { // Sub types like field formats - int64, double, string. // Aggregation methods - SUM, LATEST, TOP-N, BOTTOM-N, DISTRIBUTION // valid values are: - // beam:metrics:[SumInt64|LatestInt64|Top-NInt64|Bottom-NInt64| - // SumDouble|LatestDouble|Top-NDouble|Bottom-NDouble|DistributionInt64| - // DistributionDouble|MonitoringDataTable] + // beam:metrics:[sum_int_64|latest_int_64|top_n_int_64|bottom_n_int_64| + // sum_double|latest_double|top_n_double|bottom_n_double| + // distribution_int_64|distribution_double|monitoring_data_table string type = 2; // The Metric or monitored state. @@ -302,6 +308,45 @@ message MonitoringInfo { // Some systems such as Stackdriver will be able to aggregate the metrics // using a subset of the provided labels map labels = 5; + + // The walltime of the most recent update. + // Useful for aggregation for Latest types such as LatestInt64. + google.protobuf.Timestamp timestamp = 6; +} + +message MonitoringInfoUrns { + enum Enum { +USER_COUNTER_URN_PREFIX = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metric:user"]; + +ELEMENT_COUNT = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metric:element_count:v1"]; + +START_BUNDLE_MSECS = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) = + "beam:metric:pardo_execution_time:start_bundle_msecs:v1"]; + +PROCESS_BUNDLE_MSECS = 3 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metric:pardo_execution_time:process_bundle_msecs:v1"]; + +FINISH_BUNDLE_MSECS = 4 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metric:pardo_execution_time:finish_bundle_msecs:v1"]; + +TOTAL_MSECS = 5 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metric:ptransform_execution_time:total_msecs:v1"]; + } +} + +message MonitoringInfoTypeUrns { + enum Enum { +SUM_INT64_TYPE = 0 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metrics:sum_int_64"]; + +DISTRIBUTION_INT64_TYPE = 1 [(org.apache.beam.model.pipeline.v1.beam_urn) = +"beam:metrics:distribution_int_64"]; + +LATEST_INT64_TYPE = 2 [(org.apache.beam.model.pipeline.v1.beam_urn) = + "beam:metrics:latest_int_64"]; + } } message Metric { @@ -525,12 +570,16 @@ message Metrics { } message ProcessBundleProgressResponse { - // (Required) + // DEPRECATED (Required) Metrics metrics = 1; // (Optional) Specifies that the bundle has been split since the last // ProcessBundleProgressResponse was sent. BundleSplit split = 2; + + // (Required) The list of metrics or other MonitoredState + // collected while processing this bundle. + repeated MonitoringInfo monitoring_infos = 3; } message ProcessBundleSplitRequest { @@ -795,7 +844,6 @@ message LogEntry { enum Enum { // Unspecified level information. Will be logged at the TRACE level. UNSPECIFIED = 0; - // Trace
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=154063&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154063 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 12/Oct/18 23:58 Start Date: 12/Oct/18 23:58 Worklog Time Spent: 10m Work Description: pabloem commented on issue #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#issuecomment-429492504 Okay, as this looks good, I'll go ahead and merge. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154063) Time Spent: 8h 20m (was: 8h 10m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 8h 20m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=154047&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154047 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 12/Oct/18 22:10 Start Date: 12/Oct/18 22:10 Worklog Time Spent: 10m Work Description: ajamato commented on issue #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#issuecomment-429476625 Squashed all the commits, FYI I imported this PR and internal google tests are also passing. @robertwb, happy to iterate more on your suggestions but we would like to submit this PR, and finish up this iteration This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154047) Time Spent: 8h 10m (was: 8h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 8h 10m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=153805&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153805 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 12/Oct/18 07:42 Start Date: 12/Oct/18 07:42 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r224696711 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py ## @@ -489,46 +529,65 @@ def test_progress_metrics(self): beam.pvalue.TaggedOutput('twice', x)])) res = p.run() res.wait_until_finish() -try: Review comment: Keep these tests as well until we delete the legacy reporting? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153805) Time Spent: 8h (was: 7h 50m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 8h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=153803&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153803 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 12/Oct/18 07:40 Start Date: 12/Oct/18 07:40 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r224693651 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -313,6 +313,14 @@ def metrics(self): self._fix_output_tags(transform_id, op.progress_metrics()) for transform_id, op in self.ops.items()}) + def monitoring_infos(self): Review comment: Testing the bundle processor requires providing a properly formed execution tree. The easiest way *by far* to correctly construct such a tree is by using the user-facing API (which, for the moment, is also significantly more stable). Though some simplification can be done (especially once the legacy worker goes away), the inherent inter-dependencies here will mean that this will remain true. This is also in line with the mantra of testing user-facing behavior rather than implementation details. Of course sometimes one needs more focused implementation detail tests. But often tests that depend heavily on mocks of complicated systems tend to be brittle, require constant updates, often lack coverage, and even get out of sync with real-world use. Put another way, the fn_api_runner_tests are not true end-to-end tests; the fn_api_runner was originally written to be a testing fake to aid the development of the fn_api worker (and frontend). The fact that we're now using it as a direct runner is just because it's so much faster and allows us to share more code with real portable runners. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153803) Time Spent: 7h 50m (was: 7h 40m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 7h 50m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=153797&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153797 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 12/Oct/18 07:29 Start Date: 12/Oct/18 07:29 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r224693651 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -313,6 +313,14 @@ def metrics(self): self._fix_output_tags(transform_id, op.progress_metrics()) for transform_id, op in self.ops.items()}) + def monitoring_infos(self): Review comment: Testing the bundle processor requires providing a properly formed execution tree. The easiest way by far to construct such a tree is by using the user-facing API (which, for the moment, is also significantly more stable). This is also in line with the mantra of testing user-facing behavior rather than implementation details. Of course sometimes one needs more focused implementation detail tests. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153797) Time Spent: 7h 40m (was: 7.5h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 7h 40m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=153788&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153788 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 12/Oct/18 07:21 Start Date: 12/Oct/18 07:21 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r224691794 ## File path: model/fn-execution/src/main/proto/beam_fn_api.proto ## @@ -302,6 +308,45 @@ message MonitoringInfo { // Some systems such as Stackdriver will be able to aggregate the metrics // using a subset of the provided labels map labels = 5; + + // The walltime of the most recent update. + // Useful for aggregation for Latest types such as LatestInt64. + google.protobuf.Timestamp timestamp = 6; +} + +message MonitoringInfoUrns { + enum Enum { Review comment: I wonder if we should consider using the yaml file as the source of truth, rather than having redundancy here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 153788) Time Spent: 7.5h (was: 7h 20m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=153787&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153787 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 12/Oct/18 07:20 Start Date: 12/Oct/18 07:20 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r224691628 ## File path: sdks/python/apache_beam/metrics/monitoring_infos.py ## @@ -0,0 +1,246 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# cython: language_level=3 +# cython: profile=True + +from __future__ import absolute_import + +import time + +from google.protobuf import timestamp_pb2 + +from apache_beam.metrics.cells import DistributionData +from apache_beam.metrics.cells import DistributionResult +from apache_beam.metrics.cells import GaugeData +from apache_beam.metrics.cells import GaugeResult +from apache_beam.portability import common_urns +from apache_beam.portability.api.beam_fn_api_pb2 import CounterData +from apache_beam.portability.api.beam_fn_api_pb2 import Metric +from apache_beam.portability.api.beam_fn_api_pb2 import MonitoringInfo + +ELEMENT_COUNT_URN = common_urns.monitoring_infos.ELEMENT_COUNT.urn +START_BUNDLE_MSECS_URN = common_urns.monitoring_infos.START_BUNDLE_MSECS.urn +PROCESS_BUNDLE_MSECS_URN = common_urns.monitoring_infos.PROCESS_BUNDLE_MSECS.urn +FINISH_BUNDLE_MSECS_URN = common_urns.monitoring_infos.FINISH_BUNDLE_MSECS.urn +TOTAL_MSECS_URN = common_urns.monitoring_infos.TOTAL_MSECS.urn +USER_COUNTER_URN_PREFIX = ( +common_urns.monitoring_infos.USER_COUNTER_URN_PREFIX.urn) + +# TODO(ajamato): Implement the remaining types, i.e. Double types +# Extrema types, etc. See: +# https://s.apache.org/beam-fn-api-metrics +SUM_INT64_TYPE = common_urns.monitoring_info_types.SUM_INT64_TYPE.urn +DISTRIBUTION_INT64_TYPE = ( +common_urns.monitoring_info_types.DISTRIBUTION_INT64_TYPE.urn) +LATEST_INT64_TYPE = common_urns.monitoring_info_types.LATEST_INT64_TYPE.urn + +COUNTER_TYPES = set([SUM_INT64_TYPE]) +DISTRIBUTION_TYPES = set([DISTRIBUTION_INT64_TYPE]) +GAUGE_TYPES = set([LATEST_INT64_TYPE]) + + +def to_timestamp_proto(timestamp_secs): + """Converts seconds since epoch to a google.protobuf.Timestamp. + + Args: +timestamp_secs: The timestamp in seconds since epoch. + """ + seconds = int(timestamp_secs) + nanos = int((timestamp_secs - seconds) * 10**9) + return timestamp_pb2.Timestamp(seconds=seconds, nanos=nanos) + + +def to_timestamp_secs(timestamp_proto): + """Converts a google.protobuf.Timestamp to seconds since epoch. + + Args: +timestamp_proto: The google.protobuf.Timestamp. + """ + return timestamp_proto.seconds + timestamp_proto.nanos * 10**-9 + + +def extract_counter_value(monitoring_info_proto): + """Returns the int coutner value of the monitoring info.""" + if is_counter(monitoring_info_proto) or is_gauge(monitoring_info_proto): +return monitoring_info_proto.metric.counter_data.int64_value + return None + + +def extract_distribution(monitoring_info_proto): + """Returns the relevant DistributionInt64 or DistributionDouble. + + Args: +monitoring_info_proto: The monitoring infor for the distribution. + """ + if is_distribution(monitoring_info_proto): +return monitoring_info_proto.metric.distribution_data.int_distribution_data + return None + + +def create_labels(ptransform='', tag=''): + """Create the label dictionary based on the provided tags. + + Args: +ptransform: The ptransform/step name. +tag: he output tag name, used as a label. + """ + labels = {} + if tag: +labels['TAG'] = tag + if ptransform: +labels['PTRANSFORM'] = ptransform + return labels + + +def int64_counter(urn, metric, ptransform='', tag=''): Review comment: This code feels very repetitive; isn't this excactly the kind of thing we should be pulling out of the yaml file? This is an automated message f
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=150184&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-150184 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 01/Oct/18 21:46 Start Date: 01/Oct/18 21:46 Worklog Time Spent: 10m Work Description: pabloem commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r221766913 ## File path: sdks/python/apache_beam/metrics/cells.py ## @@ -161,6 +161,14 @@ def get_cumulative(self): with self._lock: return self.value + def to_runner_api_monitoring_info(self): Review comment: Can you add a TODO and a small explanation of this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 150184) Time Spent: 7h 10m (was: 7h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=149212&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-149212 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 28/Sep/18 16:18 Start Date: 28/Sep/18 16:18 Worklog Time Spent: 10m Work Description: ajamato commented on issue #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#issuecomment-425488762 I have fixed up all the test issues in this PR now, its ready for more review This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 149212) Time Spent: 7h (was: 6h 50m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=147770&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147770 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 25/Sep/18 20:33 Start Date: 25/Sep/18 20:33 Worklog Time Spent: 10m Work Description: pabloem commented on issue #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#issuecomment-424491591 i think the test failures that you see in py precommit should be solved by rebasing This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 147770) Time Spent: 6h 50m (was: 6h 40m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=147384&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147384 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 24/Sep/18 23:41 Start Date: 24/Sep/18 23:41 Worklog Time Spent: 10m Work Description: ajamato commented on issue #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#issuecomment-424160596 > @ajamato What are the deprecated metrics you mention in PR description? I do not see them in the design doc. I don't mean specific metrics are deprecated. but the technique for passing metrics across the FN API is deprecated. This PR replaces how the metrics were inserted into the ProcessBundleRespose This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 147384) Time Spent: 6h 40m (was: 6.5h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=147383&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-147383 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 24/Sep/18 23:39 Start Date: 24/Sep/18 23:39 Worklog Time Spent: 10m Work Description: ajamato commented on issue #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#issuecomment-424160224 > @ajamato A more general question for my knowledge: as the communication between runner harness and SDK harness is done on a bundle basis. When the runner harness sends data to the sdk harness to execute a transform that contains metrics, does it: > > 1. send metrics values (for the ones defined in the transform) alongside with data and receive an updated value of the metrics when the bundle is finished processing? > 2. or does it send only the data and the sdk harness responds with a diff value of the metrics so that the runner can update them in its side? > > My bet is option 2. But can you confirm? Option 2, the metric is calculated for the bundle. That is, we send a single metric update for the bundle. The design expects that the upstream systems will handle that aggregation. It may sum all the metrics up together, outside of the SDK. The runner should forward these metrics to a metric aggregation system across to aggregate the metrics to a final value across all the workers. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 147383) Time Spent: 6.5h (was: 6h 20m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 6.5h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=142637&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-142637 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 10/Sep/18 09:00 Start Date: 10/Sep/18 09:00 Worklog Time Spent: 10m Work Description: echauchot commented on issue #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#issuecomment-419840369 @ajamato A more general question for my knowledge: as the communication between runner harness and SDK harness is done on a bundle basis. When the runner harness sends data to the sdk harness to execute a transform that contains metrics, does it: 1. send metrics values (for the ones defined in the transform) alongside with data and receive an updated value of the metrics when the bundle is finished processing? 2. or does it send only the data and the sdk harness responds with a diff value of the metrics so that the runner can update them in its side? My bet is option 2. But can you confirm? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 142637) Time Spent: 6h 20m (was: 6h 10m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 6h 20m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=141693&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-141693 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 06/Sep/18 09:52 Start Date: 06/Sep/18 09:52 Worklog Time Spent: 10m Work Description: echauchot commented on issue #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#issuecomment-419035110 @ajamato What are the deprecated metrics you mention in PR description? I do not see them in the design doc. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 141693) Time Spent: 6h 10m (was: 6h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=141520&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-141520 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 05/Sep/18 20:20 Start Date: 05/Sep/18 20:20 Worklog Time Spent: 10m Work Description: pabloem commented on issue #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#issuecomment-418867827 Lint issue is breaking python precommits ^^ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 141520) Time Spent: 6h (was: 5h 50m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=141511&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-141511 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 05/Sep/18 19:38 Start Date: 05/Sep/18 19:38 Worklog Time Spent: 10m Work Description: pabloem commented on issue #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#issuecomment-418855599 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 141511) Time Spent: 5h 50m (was: 5h 40m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=140160&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-140160 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 31/Aug/18 16:52 Start Date: 31/Aug/18 16:52 Worklog Time Spent: 10m Work Description: ajamato commented on issue #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#issuecomment-417726103 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 140160) Time Spent: 5h 40m (was: 5.5h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=138497&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138497 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 27/Aug/18 18:00 Start Date: 27/Aug/18 18:00 Worklog Time Spent: 10m Work Description: ajamato commented on issue #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#issuecomment-416313324 Ready for another round of reviews. I'll investigate the presubmit in the meantime. Thank you all for the reviews. I have addressed all the comments. For some reason I can't reply to some of the threads in the UI though, but I have addressed everything I saw. Please let me know if I missed something @pabloem @echauchot @lukecwik @aaltay @angoenka This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 138497) Time Spent: 5.5h (was: 5h 20m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=138490&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138490 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 27/Aug/18 17:37 Start Date: 27/Aug/18 17:37 Worklog Time Spent: 10m Work Description: ajamato commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r213054095 ## File path: sdks/python/apache_beam/runners/worker/operations.py ## @@ -189,6 +191,58 @@ def progress_metrics(self): else None))), user=self.metrics_container.to_runner_api()) + def monitoring_infos(self, transform_id): +"""Returns the list of MonitoringInfos collected by this operation.""" +return (self.execution_time_metrics(transform_id) + +self.element_count_metrics(transform_id) + +self.user_metrics(transform_id)) Review comment: I have made the FnApiResults have a metrics() and monitoring_metrics() as you have suggested. Please see updated FnApiMetrics as well This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 138490) Time Spent: 5h 20m (was: 5h 10m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=138474&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138474 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 27/Aug/18 17:10 Start Date: 27/Aug/18 17:10 Worklog Time Spent: 10m Work Description: ajamato commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r213046269 ## File path: sdks/python/apache_beam/runners/worker/operations.py ## @@ -189,6 +191,58 @@ def progress_metrics(self): else None))), user=self.metrics_container.to_runner_api()) + def monitoring_infos(self, transform_id): +"""Returns the list of MonitoringInfos collected by this operation.""" +return (self.execution_time_metrics(transform_id) + +self.element_count_metrics(transform_id) + +self.user_metrics(transform_id)) Review comment: Ack This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 138474) Time Spent: 5h 10m (was: 5h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=138473&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138473 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 27/Aug/18 17:10 Start Date: 27/Aug/18 17:10 Worklog Time Spent: 10m Work Description: ajamato commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r213046143 ## File path: sdks/python/apache_beam/metrics/cells.py ## @@ -161,6 +161,14 @@ def get_cumulative(self): with self._lock: return self.value + def to_runner_api_monitoring_info(self): Review comment: I also added a way to get the non user metrics from the MetricResult and I am using that in a test This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 138473) Time Spent: 5h (was: 4h 50m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=138470&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138470 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 27/Aug/18 17:05 Start Date: 27/Aug/18 17:05 Worklog Time Spent: 10m Work Description: ajamato commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r213044847 ## File path: sdks/python/apache_beam/metrics/monitoring_infos.py ## @@ -0,0 +1,216 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# cython: language_level=3 +# cython: profile=True + +from __future__ import absolute_import +from apache_beam.portability.api import beam_fn_api_pb2 +from apache_beam.metrics.cells import DistributionData +from apache_beam.metrics.cells import DistributionResult +from apache_beam.metrics.cells import GaugeResult +from apache_beam.metrics.cells import GaugeData +from google.protobuf import timestamp_pb2 + +import time + +USER_COUNTER_URN_PREFIX = 'beam:metric:user:' +ELEMENT_COUNT_URN = 'beam:metric:element_count:v1' +START_BUNDLE_MSECS_URN = ( +'beam:metric:pardo_execution_time:start_bundle_msecs:v1') +PROCESS_BUNDLE_MSECS_URN = ( +'beam:metric:pardo_execution_time:process_bundle_msecs:v1') +FINISH_BUNDLE_MSECS_URN = ( +'beam:metric:pardo_execution_time:finish_bundle_msecs:v1') +TOTAL_MSECS_URN = ( +'beam:metric:ptransform_execution_time:total_msecs:v1') + +# TODO(ajamato): Implement the remaining types, i.e. Double types +# Extrema types, etc. See: +# https://s.apache.org/beam-fn-api-metrics +SUM_INT64_TYPE = 'beam:metrics:SumInt64' Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 138470) Time Spent: 4h 50m (was: 4h 40m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=138466&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138466 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 27/Aug/18 17:02 Start Date: 27/Aug/18 17:02 Worklog Time Spent: 10m Work Description: ajamato commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r213043872 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -313,6 +313,14 @@ def metrics(self): self._fix_output_tags(transform_id, op.progress_metrics()) for transform_id, op in self.ops.items()}) + def monitoring_infos(self): Review comment: Since I was not able to properly instantiate these classes I have tested the monitoring_infos by providing an accessor to the MetricResults and updated the fn_api_runner_test accordingly. As discussed in this thread: https://lists.apache.org/thread.html/6ad0d5446ffc08850ef684e3fc4e0c2d39ef58e9bad2b8419db02b41@%3Cdev.beam.apache.org%3E This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 138466) Time Spent: 4h 40m (was: 4.5h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134793&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134793 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 15/Aug/18 00:02 Start Date: 15/Aug/18 00:02 Worklog Time Spent: 10m Work Description: ajamato commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210140370 ## File path: sdks/python/apache_beam/metrics/cells.py ## @@ -161,6 +161,14 @@ def get_cumulative(self): with self._lock: return self.value + def to_runner_api_monitoring_info(self): Review comment: Adding a CounterData class would make more sense but I would say this is out of scope of this change This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134793) Time Spent: 4.5h (was: 4h 20m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134784&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134784 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 23:17 Start Date: 14/Aug/18 23:17 Worklog Time Spent: 10m Work Description: ajamato commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210133169 ## File path: model/fn-execution/src/main/proto/metric_definitions.yaml ## @@ -0,0 +1,57 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Metrics Definitions describing various BEAM metrics. +# See: https://s.apache.org/beam-fn-api-metrics + +- annotations: +description: The total estimated execution time of the ptransform +unit: msecs + labels: + - PTRANSFORM + type: beam:metrics:SumInt64 + urn: beam:metric:ptransform_execution_time:total_msecs:v1 +- annotations: +description: The total estimated execution time of the start bundle function in + a pardo +unit: msecs + labels: + - PTRANSFORM + type: beam:metrics:SumInt64 + urn: beam:metric:pardo_execution_time:start_bundle_msecs:v1 +- annotations: +description: The total estimated execution time of the process bundle function + in a pardo +unit: msecs + labels: + - PTRANSFORM + type: beam:metrics:SumInt64 + urn: beam:metric:pardo_execution_time:process_bundle_msecs:v1 +- annotations: +description: The total estimated execution time of the finish bundle function + in a pardo +unit: msecs + labels: + - PTRANSFORM + type: beam:metrics:SumInt64 + urn: beam:metric:pardo_execution_time:finish_bundle_msecs:v1 +- annotations: +description: The total elements counted for a metric. + labels: + - PTRANSFORM + type: beam:metrics:SumInt64 + urn: beam:metric:element_count:v1 Review comment: This file is for system defined ones, which we don't seem to have. As they are only used in use counters. Unless I am missing something. Happy to add one if we have an example in another SDK perhaps. Right now this is just a catalog, its not actually loaded or used anywhere. The doc covers how to add more. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134784) Time Spent: 4h 20m (was: 4h 10m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134782&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134782 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 23:12 Start Date: 14/Aug/18 23:12 Worklog Time Spent: 10m Work Description: ajamato commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210132412 ## File path: sdks/python/apache_beam/metrics/cells.py ## @@ -375,6 +383,14 @@ def from_runner_api(proto): float(proto.timestamp.nanos) / 10**9) return GaugeData(proto.value, timestamp=gauge_timestamp) + def to_runner_api_monitoring_info(self): +"""Returns a Metric with this value for use in a MonitoringInfo.""" +return beam_fn_api_pb2.Metric( +counter_data=beam_fn_api_pb2.CounterData( Review comment: No, because as a monitoring_info it should be encoded as a counter. Since they use the same data format, with a different aggregation method. (gauges are just latest counters). Described in the proposal doc This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134782) Time Spent: 4h 10m (was: 4h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134781&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134781 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 23:11 Start Date: 14/Aug/18 23:11 Worklog Time Spent: 10m Work Description: ajamato commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210132238 ## File path: sdks/python/apache_beam/metrics/execution.py ## @@ -211,6 +215,31 @@ def to_runner_api(self): for k, v in self.gauges.items()] ) + def to_runner_api_monitoring_infos(self, transform_id): +"""Returns a list of MonitoringInfos for the metrics in this container.""" +all_user_metrics = [] +for k, v in self.counters.items(): + all_user_metrics.append(int64_counter( + user_metric_urn(k.namespace, k.name), Review comment: The type_urn defines the type of aggregation This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134781) Time Spent: 4h (was: 3h 50m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134780&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134780 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 23:11 Start Date: 14/Aug/18 23:11 Worklog Time Spent: 10m Work Description: ajamato commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210132162 ## File path: sdks/python/apache_beam/metrics/monitoring_infos.py ## @@ -0,0 +1,231 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# cython: language_level=3 +# cython: profile=True + +from __future__ import absolute_import + +import time + +from google.protobuf import timestamp_pb2 + +from apache_beam.metrics.cells import DistributionData +from apache_beam.metrics.cells import DistributionResult +from apache_beam.metrics.cells import GaugeData +from apache_beam.metrics.cells import GaugeResult +from apache_beam.portability.api import beam_fn_api_pb2 + +USER_COUNTER_URN_PREFIX = 'beam:metric:user:' +ELEMENT_COUNT_URN = 'beam:metric:element_count:v1' +START_BUNDLE_MSECS_URN = ( +'beam:metric:pardo_execution_time:start_bundle_msecs:v1') +PROCESS_BUNDLE_MSECS_URN = ( +'beam:metric:pardo_execution_time:process_bundle_msecs:v1') +FINISH_BUNDLE_MSECS_URN = ( +'beam:metric:pardo_execution_time:finish_bundle_msecs:v1') +TOTAL_MSECS_URN = ( +'beam:metric:ptransform_execution_time:total_msecs:v1') + +# TODO(ajamato): Implement the remaining types, i.e. Double types +# Extrema types, etc. See: +# https://s.apache.org/beam-fn-api-metrics +SUM_INT64_TYPE = 'beam:metrics:sum_int_64' +DISTRIBUTION_INT64_TYPE = 'beam:metrics:distribution_int_64' +LATEST_INT64_TYPE = 'beam:metrics:latest_int_64' + + +def to_timestamp_proto(timestamp_secs): + """Converts seconds since epoch to a google.protobuf.Timestamp. + + Args: +timestamp_secs: The timestamp in seconds since epoch. + """ + seconds = int(timestamp_secs) + nanos = int((timestamp_secs - seconds) * 10**9) + return timestamp_pb2.Timestamp(seconds=seconds, nanos=nanos) + + +def to_timestamp_secs(timestamp_proto): + """Converts a google.protobuf.Timestamp to seconds since epoch. + + Args: +timestamp_proto: The google.protobuf.Timestamp. + """ + return timestamp_proto.seconds + timestamp_proto.nanos * 10**-9 + + +def extract_counter_value(monitoring_info_proto): + """Returns the int coutner value of the monitoring info.""" + if is_counter(monitoring_info_proto) or is_gauge(monitoring_info_proto): +return monitoring_info_proto.metric.counter_data.int64_value + return None + + +def extract_distribution(monitoring_info_proto): + """Returns the relevant DistributionInt64 or DistributionDouble. + + Args: +monitoring_info_proto: The monitoring infor for the distribution. + """ + if is_distribution(monitoring_info_proto): +return monitoring_info_proto.metric.distribution_data.int_distribution_data + return None + + +def create_labels(ptransform='', tag=''): + """Create the label dictionary based on the provided tags. + + Args: +ptransform: The ptransform/step name. +tag: he output tag name, used as a label. + """ + labels = {} + if tag: +labels['TAG'] = tag + if ptransform: +labels['PTRANSFORM'] = ptransform + return labels + + +def int64_counter(urn, metric, ptransform='', tag=''): + """Return the counter monitoring info for the specifed URN, metric and labels. + + Args: +urn: The URN of the monitoring info/metric. +metric: The metric proto field to use in the monitoring info. +Or an int value. +ptransform: The ptransform/step name used as a label. +tag: The output tag name, used as a label. + """ + labels = create_labels(ptransform=ptransform, tag=tag) + if isinstance(metric, int): +metric = beam_fn_api_pb2.Metric( +counter_data=beam_fn_api_pb2.CounterData( +int64_value=metric +) +) + return create_monitoring_info(urn, SUM_INT64_TYPE, metric, labels) + + +def int
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134775&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134775 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 23:03 Start Date: 14/Aug/18 23:03 Worklog Time Spent: 10m Work Description: ajamato commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210130780 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py ## @@ -1264,10 +1270,26 @@ def get(self, timeout=None): class FnApiMetrics(metrics.metric.MetricResults): - def __init__(self, step_metrics): + def __init__(self, step_metrics, step_monitoring_infos, + use_monitoring_infos=False, user_metrics_only=True): +"""Used for querying metrics from the PipelineResult object. + + step_metrics: The deprecated legacy metrics format. + step_monitoring_infos: The same metrics specified as MonitoringInfos. + use_monitoring_infos: If true, return the metrics based on the + step_monitoring_infos. + user_metrics_only: True if user metrics only, False if all metrics. +""" self._counters = {} self._distributions = {} self._gauges = {} +self.user_metrics_only = user_metrics_only +if use_monitoring_infos: + self._init_metrics_from_monitoring_infos(step_monitoring_infos) +else: + self._init_metrics_from_legacy_metrics(step_metrics) Review comment: We can get rid of it. Its only there as a precaution before flipping it on. The monitoring_infos match the functionality of the legacy metrics, which one of my tests is looking for. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134775) Time Spent: 3h 40m (was: 3.5h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134774&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134774 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 23:01 Start Date: 14/Aug/18 23:01 Worklog Time Spent: 10m Work Description: ajamato commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210130540 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py ## @@ -1304,17 +1351,25 @@ def query(self, filter=None): class RunnerResult(runner.PipelineResult): - def __init__(self, state, metrics_by_stage): + def __init__(self, state, metrics_by_stage, monitoring_infos_by_stage): super(RunnerResult, self).__init__(state) self._metrics_by_stage = metrics_by_stage +self._monitoring_infos_by_stage = monitoring_infos_by_stage self._user_metrics = None def wait_until_finish(self, duration=None): return self._state - def metrics(self): + def metrics(self, use_monitoring_infos=False, user_metrics_only=True): Review comment: Based on the few tests I have done, I have matched the existing functionality. So we could go ahead and delete the old code if you would like. I just thought of this as a precautionary thing to do before flipping it on. I think its okay to rename to RunnerResult.monitoring_infos. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134774) Time Spent: 3.5h (was: 3h 20m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134771&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134771 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 22:56 Start Date: 14/Aug/18 22:56 Worklog Time Spent: 10m Work Description: ajamato commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210129620 ## File path: sdks/python/apache_beam/runners/worker/sdk_worker.py ## @@ -223,6 +223,7 @@ def register(self, request, instruction_id): instruction_id=instruction_id, register=beam_fn_api_pb2.RegisterResponse()) + # Are these methods dead code? No. invoked via do_instruction Review comment: rmi-ng. That was just meant to be a note for me. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134771) Time Spent: 3h 20m (was: 3h 10m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134702&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134702 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 20:22 Start Date: 14/Aug/18 20:22 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210087299 ## File path: sdks/python/apache_beam/runners/worker/sdk_worker.py ## @@ -241,15 +242,18 @@ def process_bundle(self, request, instruction_id): return beam_fn_api_pb2.InstructionResponse( instruction_id=instruction_id, process_bundle=beam_fn_api_pb2.ProcessBundleResponse( -metrics=processor.metrics())) +metrics=processor.metrics(), +monitoring_infos=processor.monitoring_infos())) + # Are these methods dead code? Review comment: ditto This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134702) Time Spent: 3h (was: 2h 50m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134701&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134701 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 20:22 Start Date: 14/Aug/18 20:22 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210087115 ## File path: sdks/python/apache_beam/runners/worker/sdk_worker.py ## @@ -223,6 +223,7 @@ def register(self, request, instruction_id): instruction_id=instruction_id, register=beam_fn_api_pb2.RegisterResponse()) + # Are these methods dead code? No. invoked via do_instruction Review comment: These methods are called here https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/sdk_worker.py#L214 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134701) Time Spent: 2h 50m (was: 2h 40m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134703&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134703 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 20:22 Start Date: 14/Aug/18 20:22 Worklog Time Spent: 10m Work Description: angoenka commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210090493 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -313,6 +313,14 @@ def metrics(self): self._fix_output_tags(transform_id, op.progress_metrics()) for transform_id, op in self.ops.items()}) + def monitoring_infos(self): Review comment: We don't have unit tests for bundle_processors. It will be a good idea to start one and add test cases as and when we modify bundle_processor. For now you can simply have dummy operator and directly access bundle_processor.py fields to test it. Actual op.monitoring_infos can be checked separately in the unit tests for operators. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134703) Time Spent: 3h 10m (was: 3h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134682&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134682 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 19:58 Start Date: 14/Aug/18 19:58 Worklog Time Spent: 10m Work Description: pabloem commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210083749 ## File path: model/fn-execution/src/main/proto/metric_definitions.yaml ## @@ -0,0 +1,57 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Metrics Definitions describing various BEAM metrics. +# See: https://s.apache.org/beam-fn-api-metrics + +- annotations: +description: The total estimated execution time of the ptransform +unit: msecs + labels: + - PTRANSFORM + type: beam:metrics:SumInt64 + urn: beam:metric:ptransform_execution_time:total_msecs:v1 +- annotations: +description: The total estimated execution time of the start bundle function in + a pardo +unit: msecs + labels: + - PTRANSFORM + type: beam:metrics:SumInt64 + urn: beam:metric:pardo_execution_time:start_bundle_msecs:v1 +- annotations: +description: The total estimated execution time of the process bundle function + in a pardo +unit: msecs + labels: + - PTRANSFORM + type: beam:metrics:SumInt64 + urn: beam:metric:pardo_execution_time:process_bundle_msecs:v1 +- annotations: +description: The total estimated execution time of the finish bundle function + in a pardo +unit: msecs + labels: + - PTRANSFORM + type: beam:metrics:SumInt64 + urn: beam:metric:pardo_execution_time:finish_bundle_msecs:v1 +- annotations: +description: The total elements counted for a metric. + labels: + - PTRANSFORM + type: beam:metrics:SumInt64 + urn: beam:metric:element_count:v1 Review comment: Can you add examples of other metrics? e.g. distribution / gauge types This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134682) Time Spent: 2h 40m (was: 2.5h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134678&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134678 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 19:56 Start Date: 14/Aug/18 19:56 Worklog Time Spent: 10m Work Description: pabloem commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210063210 ## File path: sdks/python/apache_beam/metrics/monitoring_infos.py ## @@ -0,0 +1,231 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# cython: language_level=3 +# cython: profile=True + +from __future__ import absolute_import + +import time + +from google.protobuf import timestamp_pb2 + +from apache_beam.metrics.cells import DistributionData +from apache_beam.metrics.cells import DistributionResult +from apache_beam.metrics.cells import GaugeData +from apache_beam.metrics.cells import GaugeResult +from apache_beam.portability.api import beam_fn_api_pb2 + +USER_COUNTER_URN_PREFIX = 'beam:metric:user:' +ELEMENT_COUNT_URN = 'beam:metric:element_count:v1' +START_BUNDLE_MSECS_URN = ( +'beam:metric:pardo_execution_time:start_bundle_msecs:v1') +PROCESS_BUNDLE_MSECS_URN = ( +'beam:metric:pardo_execution_time:process_bundle_msecs:v1') +FINISH_BUNDLE_MSECS_URN = ( +'beam:metric:pardo_execution_time:finish_bundle_msecs:v1') +TOTAL_MSECS_URN = ( +'beam:metric:ptransform_execution_time:total_msecs:v1') + +# TODO(ajamato): Implement the remaining types, i.e. Double types +# Extrema types, etc. See: +# https://s.apache.org/beam-fn-api-metrics +SUM_INT64_TYPE = 'beam:metrics:sum_int_64' +DISTRIBUTION_INT64_TYPE = 'beam:metrics:distribution_int_64' +LATEST_INT64_TYPE = 'beam:metrics:latest_int_64' + + +def to_timestamp_proto(timestamp_secs): + """Converts seconds since epoch to a google.protobuf.Timestamp. + + Args: +timestamp_secs: The timestamp in seconds since epoch. + """ + seconds = int(timestamp_secs) + nanos = int((timestamp_secs - seconds) * 10**9) + return timestamp_pb2.Timestamp(seconds=seconds, nanos=nanos) + + +def to_timestamp_secs(timestamp_proto): + """Converts a google.protobuf.Timestamp to seconds since epoch. + + Args: +timestamp_proto: The google.protobuf.Timestamp. + """ + return timestamp_proto.seconds + timestamp_proto.nanos * 10**-9 + + +def extract_counter_value(monitoring_info_proto): + """Returns the int coutner value of the monitoring info.""" + if is_counter(monitoring_info_proto) or is_gauge(monitoring_info_proto): +return monitoring_info_proto.metric.counter_data.int64_value + return None + + +def extract_distribution(monitoring_info_proto): + """Returns the relevant DistributionInt64 or DistributionDouble. + + Args: +monitoring_info_proto: The monitoring infor for the distribution. + """ + if is_distribution(monitoring_info_proto): +return monitoring_info_proto.metric.distribution_data.int_distribution_data + return None + + +def create_labels(ptransform='', tag=''): + """Create the label dictionary based on the provided tags. + + Args: +ptransform: The ptransform/step name. +tag: he output tag name, used as a label. + """ + labels = {} + if tag: +labels['TAG'] = tag + if ptransform: +labels['PTRANSFORM'] = ptransform + return labels + + +def int64_counter(urn, metric, ptransform='', tag=''): + """Return the counter monitoring info for the specifed URN, metric and labels. + + Args: +urn: The URN of the monitoring info/metric. +metric: The metric proto field to use in the monitoring info. +Or an int value. +ptransform: The ptransform/step name used as a label. +tag: The output tag name, used as a label. + """ + labels = create_labels(ptransform=ptransform, tag=tag) + if isinstance(metric, int): +metric = beam_fn_api_pb2.Metric( +counter_data=beam_fn_api_pb2.CounterData( +int64_value=metric +) +) + return create_monitoring_info(urn, SUM_INT64_TYPE, metric, labels) + + +def int
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134675&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134675 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 19:56 Start Date: 14/Aug/18 19:56 Worklog Time Spent: 10m Work Description: pabloem commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210063727 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py ## @@ -1264,10 +1270,26 @@ def get(self, timeout=None): class FnApiMetrics(metrics.metric.MetricResults): - def __init__(self, step_metrics): + def __init__(self, step_metrics, step_monitoring_infos, + use_monitoring_infos=False, user_metrics_only=True): +"""Used for querying metrics from the PipelineResult object. + + step_metrics: The deprecated legacy metrics format. + step_monitoring_infos: The same metrics specified as MonitoringInfos. + use_monitoring_infos: If true, return the metrics based on the + step_monitoring_infos. + user_metrics_only: True if user metrics only, False if all metrics. +""" self._counters = {} self._distributions = {} self._gauges = {} +self.user_metrics_only = user_metrics_only +if use_monitoring_infos: + self._init_metrics_from_monitoring_infos(step_monitoring_infos) +else: + self._init_metrics_from_legacy_metrics(step_metrics) Review comment: If I understand correctly, we can't just get rid of the `if use_monitoring_infos == False` path because these are used by the Dataflow worker? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134675) Time Spent: 2h (was: 1h 50m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134672&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134672 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 19:56 Start Date: 14/Aug/18 19:56 Worklog Time Spent: 10m Work Description: pabloem commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r209771842 ## File path: sdks/python/apache_beam/metrics/execution.py ## @@ -211,6 +215,31 @@ def to_runner_api(self): for k, v in self.gauges.items()] ) + def to_runner_api_monitoring_infos(self, transform_id): +"""Returns a list of MonitoringInfos for the metrics in this container.""" +all_user_metrics = [] +for k, v in self.counters.items(): + all_user_metrics.append(int64_counter( + user_metric_urn(k.namespace, k.name), Review comment: If I understand correctly, the kind of aggregation is not part of the urn of a counter? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134672) Time Spent: 1.5h (was: 1h 20m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134680&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134680 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 19:56 Start Date: 14/Aug/18 19:56 Worklog Time Spent: 10m Work Description: pabloem commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210083345 ## File path: sdks/python/apache_beam/metrics/execution.py ## @@ -211,6 +215,31 @@ def to_runner_api(self): for k, v in self.gauges.items()] ) + def to_runner_api_monitoring_infos(self, transform_id): +"""Returns a list of MonitoringInfos for the metrics in this container.""" +all_user_metrics = [] +for k, v in self.counters.items(): + all_user_metrics.append(int64_counter( + user_metric_urn(k.namespace, k.name), Review comment: Alex has explained this to me offfline btw This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134680) Time Spent: 2.5h (was: 2h 20m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134674&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134674 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 19:56 Start Date: 14/Aug/18 19:56 Worklog Time Spent: 10m Work Description: pabloem commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210062545 ## File path: sdks/python/apache_beam/metrics/cells.py ## @@ -375,6 +383,14 @@ def from_runner_api(proto): float(proto.timestamp.nanos) / 10**9) return GaugeData(proto.value, timestamp=gauge_timestamp) + def to_runner_api_monitoring_info(self): +"""Returns a Metric with this value for use in a MonitoringInfo.""" +return beam_fn_api_pb2.Metric( +counter_data=beam_fn_api_pb2.CounterData( Review comment: The MEtrics API does have a GaugeData proto. Should we use that here? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134674) Time Spent: 1h 50m (was: 1h 40m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134677&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134677 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 19:56 Start Date: 14/Aug/18 19:56 Worklog Time Spent: 10m Work Description: pabloem commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210066443 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -313,6 +313,14 @@ def metrics(self): self._fix_output_tags(transform_id, op.progress_metrics()) for transform_id, op in self.ops.items()}) + def monitoring_infos(self): Review comment: I'd get @angoenka to give his thoughts here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134677) Time Spent: 2h 10m (was: 2h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134676&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134676 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 19:56 Start Date: 14/Aug/18 19:56 Worklog Time Spent: 10m Work Description: pabloem commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210066242 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py ## @@ -1304,17 +1351,25 @@ def query(self, filter=None): class RunnerResult(runner.PipelineResult): - def __init__(self, state, metrics_by_stage): + def __init__(self, state, metrics_by_stage, monitoring_infos_by_stage): super(RunnerResult, self).__init__(state) self._metrics_by_stage = metrics_by_stage +self._monitoring_infos_by_stage = monitoring_infos_by_stage self._user_metrics = None def wait_until_finish(self, duration=None): return self._state - def metrics(self): + def metrics(self, use_monitoring_infos=False, user_metrics_only=True): Review comment: This call is meant to be for user metrics, not for all general metrics. I think I'd rather not change this call, because it's user-facing. Maybe we can provide the parameter in the constructor? For a test runner, I'm okay with having these parameters, but for a user-facing runner, I'd rather not. That being said, this is not backwards incompatible, so it's strictly possible to add it. Also, since `RunnerResult.metrics` is meant for user metrics, I think we should think about having a different function to return all job metrics. Something like `RunnerResult.monitoring_metrics`, or `RunnerResult.counters`, `RunnerResult.monitoring_infos`... or something like that. And perhaps, file `metrics` to return all sorts of metrics in Beam 3.. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134676) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134673&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134673 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 19:56 Start Date: 14/Aug/18 19:56 Worklog Time Spent: 10m Work Description: pabloem commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210067349 ## File path: sdks/python/apache_beam/runners/worker/operations.py ## @@ -189,6 +191,58 @@ def progress_metrics(self): else None))), user=self.metrics_container.to_runner_api()) + def monitoring_infos(self, transform_id): +"""Returns the list of MonitoringInfos collected by this operation.""" +return (self.execution_time_metrics(transform_id) + +self.element_count_metrics(transform_id) + +self.user_metrics(transform_id)) Review comment: Perf optimization: Maybe return generators instead of creating lists here, but you can iterate on that later. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134673) Time Spent: 1h 40m (was: 1.5h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134671&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134671 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 19:56 Start Date: 14/Aug/18 19:56 Worklog Time Spent: 10m Work Description: pabloem commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r209769110 ## File path: sdks/python/apache_beam/metrics/cells.py ## @@ -161,6 +161,14 @@ def get_cumulative(self): with self._lock: return self.value + def to_runner_api_monitoring_info(self): Review comment: This function is being added to thr `CounterCell` class for counters, but added to `DistributionData` and `GaugeData` instead of `DistributionCell`/`GaugeCell`. Perhaps we want to be consistent here (adding a `CounterData` class? or making the Cell classes be the main data-holding unit? WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134671) Time Spent: 1.5h (was: 1h 20m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134636&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134636 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 17:49 Start Date: 14/Aug/18 17:49 Worklog Time Spent: 10m Work Description: pabloem commented on issue #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#issuecomment-412958625 I'm doing a review. I hope to finish by today. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134636) Time Spent: 1h 20m (was: 1h 10m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134635&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134635 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 17:46 Start Date: 14/Aug/18 17:46 Worklog Time Spent: 10m Work Description: aaltay commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210043498 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -313,6 +313,14 @@ def metrics(self): self._fix_output_tags(transform_id, op.progress_metrics()) for transform_id, op in self.ops.items()}) + def monitoring_infos(self): Review comment: First approach would be preferred, if this will be a large burden it is fine to keep the existing setup. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134635) Time Spent: 1h 10m (was: 1h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=134634&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-134634 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 14/Aug/18 17:46 Start Date: 14/Aug/18 17:46 Worklog Time Spent: 10m Work Description: aaltay commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r210043194 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py ## @@ -1304,17 +1351,25 @@ def query(self, filter=None): class RunnerResult(runner.PipelineResult): - def __init__(self, state, metrics_by_stage): + def __init__(self, state, metrics_by_stage, monitoring_infos_by_stage): super(RunnerResult, self).__init__(state) self._metrics_by_stage = metrics_by_stage +self._monitoring_infos_by_stage = monitoring_infos_by_stage self._user_metrics = None def wait_until_finish(self, duration=None): return self._state - def metrics(self): + def metrics(self, use_monitoring_infos=False, user_metrics_only=True): Review comment: This is a fine change, however I will prefer it to remove the legacy stuff and change to new metrics at the same time. Is there a reason to keep both for now? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 134634) Time Spent: 1h 10m (was: 1h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=133842&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-133842 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 10/Aug/18 22:31 Start Date: 10/Aug/18 22:31 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r209398946 ## File path: sdks/python/apache_beam/metrics/monitoring_infos.py ## @@ -0,0 +1,216 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# cython: language_level=3 +# cython: profile=True + +from __future__ import absolute_import +from apache_beam.portability.api import beam_fn_api_pb2 +from apache_beam.metrics.cells import DistributionData +from apache_beam.metrics.cells import DistributionResult +from apache_beam.metrics.cells import GaugeResult +from apache_beam.metrics.cells import GaugeData +from google.protobuf import timestamp_pb2 + +import time + +USER_COUNTER_URN_PREFIX = 'beam:metric:user:' +ELEMENT_COUNT_URN = 'beam:metric:element_count:v1' +START_BUNDLE_MSECS_URN = ( +'beam:metric:pardo_execution_time:start_bundle_msecs:v1') +PROCESS_BUNDLE_MSECS_URN = ( +'beam:metric:pardo_execution_time:process_bundle_msecs:v1') +FINISH_BUNDLE_MSECS_URN = ( +'beam:metric:pardo_execution_time:finish_bundle_msecs:v1') +TOTAL_MSECS_URN = ( +'beam:metric:ptransform_execution_time:total_msecs:v1') + +# TODO(ajamato): Implement the remaining types, i.e. Double types +# Extrema types, etc. See: +# https://s.apache.org/beam-fn-api-metrics +SUM_INT64_TYPE = 'beam:metrics:SumInt64' Review comment: Can we stick with the existing case of snake_case instead of CamelCase for all URNs? Also, Eugene was able to move the URNs into proto form (https://github.com/apache/beam/pull/4672) to share across languages. Can we adopt the same now? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 133842) Time Spent: 1h (was: 50m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=133829&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-133829 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 10/Aug/18 21:42 Start Date: 10/Aug/18 21:42 Worklog Time Spent: 10m Work Description: ajamato commented on issue #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#issuecomment-412214244 @pabloem @aaltay Would you mind doing an initial review for me. I just want to get some feedback on the overall structure of the changes I have made and the testing method. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 133829) Time Spent: 50m (was: 40m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=133828&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-133828 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 10/Aug/18 21:41 Start Date: 10/Aug/18 21:41 Worklog Time Spent: 10m Work Description: ajamato commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r209391337 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py ## @@ -1304,17 +1351,25 @@ def query(self, filter=None): class RunnerResult(runner.PipelineResult): - def __init__(self, state, metrics_by_stage): + def __init__(self, state, metrics_by_stage, monitoring_infos_by_stage): super(RunnerResult, self).__init__(state) self._metrics_by_stage = metrics_by_stage +self._monitoring_infos_by_stage = monitoring_infos_by_stage self._user_metrics = None def wait_until_finish(self, duration=None): return self._state - def metrics(self): + def metrics(self, use_monitoring_infos=False, user_metrics_only=True): Review comment: This type of change, is this reasonable? Adding these extra options on pulling out ht metrrics. This is currently the only way we have to test metrics, and I added the user_metrics_only, to allow testing the system metrics. Note: We can update this code to only use use_monitoring_infos and delete all the legacy metrics once we are happy with this. But I have made use of Metrics/MonitoringInfos optional for now This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 133828) Time Spent: 40m (was: 0.5h) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=133822&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-133822 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 10/Aug/18 21:35 Start Date: 10/Aug/18 21:35 Worklog Time Spent: 10m Work Description: ajamato commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r209390191 ## File path: sdks/python/apache_beam/metrics/monitoring_infos.py ## @@ -0,0 +1,216 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# cython: language_level=3 +# cython: profile=True + +from __future__ import absolute_import +from apache_beam.portability.api import beam_fn_api_pb2 +from apache_beam.metrics.cells import DistributionData +from apache_beam.metrics.cells import DistributionResult +from apache_beam.metrics.cells import GaugeResult +from apache_beam.metrics.cells import GaugeData +from google.protobuf import timestamp_pb2 + +import time Review comment: I am curious about your overall thoughts on the use of monitoring_infos and metrics in comments and naming. Note: A Metric is a specific type of MonitoringInfo. See: https://s.apache.org/beam-fn-api-metrics This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 133822) Time Spent: 0.5h (was: 20m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=133821&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-133821 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 10/Aug/18 21:35 Start Date: 10/Aug/18 21:35 Worklog Time Spent: 10m Work Description: ajamato commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r209390191 ## File path: sdks/python/apache_beam/metrics/monitoring_infos.py ## @@ -0,0 +1,216 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# cython: language_level=3 +# cython: profile=True + +from __future__ import absolute_import +from apache_beam.portability.api import beam_fn_api_pb2 +from apache_beam.metrics.cells import DistributionData +from apache_beam.metrics.cells import DistributionResult +from apache_beam.metrics.cells import GaugeResult +from apache_beam.metrics.cells import GaugeData +from google.protobuf import timestamp_pb2 + +import time Review comment: I am curious about your overall thoughts on the use of monitoring_infos and metrics in comments and naming. Note: A Metric is a specific type of MonitoringInfo. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 133821) Time Spent: 20m (was: 10m) > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema
[ https://issues.apache.org/jira/browse/BEAM-4374?focusedWorklogId=133819&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-133819 ] ASF GitHub Bot logged work on BEAM-4374: Author: ASF GitHub Bot Created on: 10/Aug/18 21:33 Start Date: 10/Aug/18 21:33 Worklog Time Spent: 10m Work Description: ajamato commented on a change in pull request #6205: [BEAM-4374] Implementing a subset of the new metrics framework in python. URL: https://github.com/apache/beam/pull/6205#discussion_r209389982 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -313,6 +313,14 @@ def metrics(self): self._fix_output_tags(transform_id, op.progress_metrics()) for transform_id, op in self.ops.items()}) + def monitoring_infos(self): Review comment: Curious about reviewer thoughts on how to test this best. Would it make sense to test BundleProcessor on its own? It might be a bit complex to set this up, not sure if this is the kind of testing approach that we want for beam. Currently I have tested this indirectly via fn_api_runner_test (this is how the existing user metrics were being tested, the 'system' element count and processing time metrics were not tested at all :( ) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 133819) Time Spent: 10m Remaining Estimate: 0h > Update existing metrics in the FN API to use new Metric Schema > -- > > Key: BEAM-4374 > URL: https://issues.apache.org/jira/browse/BEAM-4374 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Alex Amato >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Update existing metrics to use the new proto and cataloging schema defined in: > [_https://s.apache.org/beam-fn-api-metrics_] > * Check in new protos > * Define catalog file for metrics > * Port existing metrics to use this new format, based on catalog > names+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)