FYI, I have recently observed a large amount of test failures in Beam test
suites where Dataflow Jobs failed due to a lack of CPU quota in
apache-beam-testing project.

We have been adding new suites for Python 3.x versions, which may have
contributed to this. problem.

I have not investigated yet what consumes the quota yet, but the usage
remains high.

Possible mitigation options:
- Increase quota.
- Decrease per-suite parallelism [1]. Currently we may  run 1-8 tests from
the same suite concurrently.
- Audit usage, perhaps kill stale jobs or VMs.

Ideas/opinions welcome.

I opened https://issues.apache.org/jira/browse/BEAM-7085 to track this.

[1]
https://github.com/apache/beam/search?q=%22--processes%3D%22&unscoped_q=%22--processes%3D%22

Reply via email to