Hey everyone, Currently, during the Jenkins build process, we don't generate any code coverage reports for the Python SDK. This email includes a proposal to generate python coverage reports during the pre-commit build, upload them to codecov.io for analysis and visualization, and automatically post the resulting stats back to GitHub PRs to help developers decide whether their tests need revision.
You can view/comment on the proposal here <https://docs.google.com/document/d/1K6zEJnDNvk7TkqDEPwoIa7NTj5KxtUIdq-TGewUgpmo/edit?usp=sharing>, or the full text of the proposal at the end of this email. Let me know what you think, or if there are any suggestions for improvements. Thanks! Python Coverage Reports For CI/CD Author: Saavan Nanavati ([email protected]) Reviewer: Udi Meiri ([email protected]) Overview This is a proposal for generating code coverage reports for the Python SDK during Jenkins’ pre-commit phase, and uploading them to codecov.io <http://www.codecov.io> for analysis, with integration back into GitHub using the service’s sister app <https://github.com/marketplace/codecov>. This would extend the pre-commit build time but provide valuable information for developers to revise and improve their tests before their PR is merged, rather than after when it’s less likely developers will go back to improve their coverage numbers. This particular 3rd party service has a litany of awesome benefits: - It’s free for open-source projects - It seamlessly integrates into GitHub via a comment-bot (example here <https://github.com/mockito/mockito/pull/1355#issuecomment-377795325>) - It overlays coverage report information directly onto GitHub code using Sourcegraph <https://docs.codecov.io/docs/browser-extension> - It requires no changes to Jenkins, thereby reducing the risk of breaking the live test-infra - It’s extensible and can later be used for the Java & Go SDKs if it proves to be awesome - It has an extremely responsive support team that’s happy to help open-source projects A proof-of-concept can be seen here <https://codecov.io/gh/saavan-google-intern/beam/tree/dd529c6990c9d0d54ade2f5d65bef8542a3237b4/sdks/python/apache_beam> and here <https://codecov.io/gh/saavan-google-intern/beam/src/dd529c6990c9d0d54ade2f5d65bef8542a3237b4/sdks/python/apache_beam/typehints/decorators.py> . Goals - Provide coverage stats for the Python SDK that update with every pre-commit run - Integrate these reports into GitHub so developers can take advantage of the information - Open a discussion for how these coverage results can be utilized in code reviews Non-Goals - Calculate coverage statistics using external tests located outside of the Python SDK This is ideal, but would require not only merging multiple coverage reports together but, more importantly, waiting for these tests to be triggered in the first place. The main advantage of calculating coverage during pre-commit is that developers can revise their PRs before merging, which is not guaranteed if this is a goal. However, it could be something to explore for the future. Background Providing code coverage for the Python SDK has been a problem since at least 2017 (BEAM-2762 <https://issues.apache.org/jira/browse/BEAM-2762>) with the previous solution being to calculate coverage in post-commit with coverage.py <https://coverage.readthedocs.io/en/coverage-5.1/>, and then sending the report to coveralls.io <https://coverage.readthedocs.io/en/coverage-5.1/> which would post to GitHub. At some point, this solution broke and the Tox environment used to compute coverage, cover <https://github.com/apache/beam/blob/6c95955b8bd91d4933bba57ccf1c5114563a2d21/sdks/python/tox.ini#L231>, was turned off but still remains in the codebase. There have been 4 main barriers, in the past, to re-implementing coverage that will be addressed here. 1. It’s difficult to unify coverage for some integration tests, especially ones that rely on 3rd party dependencies like GCP since it’s not possible to calculate coverage for the dependencies. 1. As stated earlier, this is a non-goal for the proposal. 1. The test reporter outputs results in the same directory which sometimes causes previous results to be overwritten. This occurs when using different parameters for the same test (e.g. running a test with Dataflow vs DirectRunner). 1. This was mainly a post-commit problem but it does require exploration since it could be an issue for pre-commit. However, even in the worst case, the coverage numbers would still be valuable since you can still see how coverage changed relatively between commits even if the absolute numbers are slightly inaccurate. 1. It’s time-consuming and non-trivial to modify and test changes to Jenkins 1. We don’t need to - this proposal integrates directly with codecov.io <http://www.codecov.io>, making Jenkins an irrelevant part of the testing infrastructure with regards to code coverage - it’s not just easier, it’s better because it provides many benefits (mentioned in the Overview section) that a Jenkins-based solution like Cobertura <https://cobertura.github.io/cobertura/> doesn’t have. Lastly, using codecov.io <http://www.codecov.io> would place less strain and maintenance on Jenkins (less jobs to run, no need for storing the reports, etc.). 1. coveralls.io isn’t the best to work with (email thread <https://lists.apache.org/thread.html/r0887fcfb0df820b624679ce63ac0c6c4ea4240da338c92c1d1ab71f2%40%3Cdev.beam.apache.org%3E> ) 1. codecov.io <http://www.codecov.io> ;) Design Details To utilize codecov.io, the existing Tox environment cover <https://github.com/apache/beam/blob/6c95955b8bd91d4933bba57ccf1c5114563a2d21/sdks/python/tox.ini#L231> needs to be added to the list of toxTasks that Gradle will run in pre-commit. To reduce strain (and more-or-less duplicate information), coverage only needs to be calculated for one Python version (say 3.7, with cloud dependencies) rather than all of them. To be compatible with Jenkins and codecov.io, the Tox environment should be configured as follows: passenv = GIT_* BUILD_* ghprb* CHANGE_ID BRANCH_NAME JENKINS_* CODECOV_* deps = ... codecov commands = ... codecov -t CODECOV_TOKEN There are two options for what goes into the ellipsis (...) First, we can use coverage.py <https://coverage.readthedocs.io/en/coverage-5.1/> to compute coverage (which is how it’s currently set up) but that uses the old nose test runner. Alternatively, we can switch to computing coverage with pytest <https://docs.pytest.org/en/stable/> and pytest-cov <https://pypi.org/project/pytest-cov/>, which would give more accurate numbers, and is the preferred solution. Additionally, the CODECOV_TOKEN, which can be retrieved here <https://codecov.io/gh/apache/beam/>, should be added as an environment variable in Jenkins. Finally, a YAML file should be added so codecov.io can resolve file path errors <https://docs.codecov.io/docs/fixing-paths>. This can also be used to define success thresholds and more (for example, here <https://github.com/GoogleChrome/lighthouse/blob/master/.codecov.yml>) though, ideally, this coverage task should never fail a Jenkins build due to the risk of false-negatives. A proof-of-concept for this design can be seen here <https://codecov.io/gh/saavan-google-intern/beam/tree/dd529c6990c9d0d54ade2f5d65bef8542a3237b4/sdks/python/apache_beam> and here <https://codecov.io/gh/saavan-google-intern/beam/src/dd529c6990c9d0d54ade2f5d65bef8542a3237b4/sdks/python/apache_beam/typehints/decorators.py> (this is code coverage for my fork).
