liferoad opened a new pull request, #35049: URL: https://github.com/apache/beam/pull/35049
**Testing Jules** Fixes https://github.com/apache/beam/issues/24776 This change addresses Apache Beam GitHub Issue #24776, where race conditions could occur during the collection of monitoring information in the Python SDK Harness, leading to errors such as: - SystemError: returned NULL without setting an error - RuntimeError: dictionary changed size during iteration - AttributeError: 'bytes' object has no attribute 'payload' - ValueError: non-UTF-8 strings The primary cause was concurrent access to metric data structures (specifically `MetricsContainer` and its underlying `MetricCell`s) by the DoFn execution thread (updating metrics) and the thread responsible for reporting bundle progress. The fix introduces the following: 1. A `threading.Lock` is added to the `MetricsContainer` class. This lock is acquired before any access or modification of the internal dictionaries that store metric cells (`self.counters`, `self.distributions`, `self.gauges`). This protection is applied during metric cell retrieval/creation (`get_metric_cell`) and when all monitoring information is collected for reporting (`to_runner_api_monitoring_infos`). 2. The `MetricsContainer`'s lock is passed to individual `MetricCell` instances (`CounterCell`, `DistributionCell`, `GaugeCell`) upon their creation. 3. Metric update methods within `CounterCell`, `DistributionCell`, and `GaugeCell` (e.g., `update()`, `set()`, `add_data()`) now acquire this container-level lock before modifying their internal state. This ensures that updates are atomic with respect to the collection process in `MetricsContainer.to_runner_api_monitoring_infos`. These changes ensure that metric data is read and updated in a thread-safe manner, preventing the previously observed errors caused by concurrent access and modification of shared metric state. **Please** add a meaningful description for your change here ------------------------ Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) ------------------------------------------------------------------------------------------------ [](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
