TL;DR: I would like to make a proposal to add (easily managed) pre-commit hooks with various lint checkers part of recommended (and encouraged) development workflow for Airflow. This will help with speeding-up local development cycle and decrease pressure on Travis CI.
Over the last few months in our other oozie-to-airflow <https://github.com/GoogleCloudPlatform/oozie-to-airflow> project we've been running pre-commit hooks - we've integrated the fantastic https://pre-commit.com/ framework and we never looked back. The upcoming changes to docker images (AIP-10) makes it easy to standardise pylint/mypy and other checks to be the same for every developer (using docker images) so it also makes possible to have consistent pre-commit environment for static checks for everyone. Pre-commit framework has a number of cool features you would expect from such tool: - it is super easy to install and manage the checks ('pre-commit install') - all the checks are managed in single .pre-commit-config.yaml file (in repo) - it allows to run various checks only on the changes you are committing - it has nice and simple UI to run checks individually or skip some checks, or run on all files - it automatically uses all processors on your machine (it will distribute changed files evenly) - it has great UI for reporting what's going on - it has a number of pre-defined checks that are super-useful - check if you have not left debug instructions (Ash :) - no more problems with that!) - check if you resolved merge conflicts - check if you have not submitted a private key - check correctness of various files (shell, yaml, xml, whether shebangs are added correctly etc.) - .... - it can even correct some of the problems found (it leaves corrected files for you to commit): - automatically adding missing licences - automatically add python-encoding pragma - add Table of Contents for markdown files - and many more - it can be integrated in CI to run the very same checks but on all files during CI. - you can use --no-verify switch to skip pre-commit There are multiple advantages of using pre-commit hooks. It not only speeds up local development (no more waiting for Travis) but if most people will start using it - it will also decrease the pressure on Travis - less people will be submitting PRs just to check if they pass all the checks. I think that might be single biggest thing that we can do to have far less CPU used by Airflow builds. There is one disadvantage I noticed - when you have unstaged changes, pre-commit stashes them before running checks and sometimes (if you Ctrl^C very quickly) it will not stash them back - but it is recoverable it just needs one liner to recover. In normal circumstances it is really safe to run. I already made a working POC of pre-commit (not intended for merge - it has many more unrelated changes that I am going to submit as separate PRs). It takes less than 20-30 seconds to run all the checks in a small commit (including pylint/mypy and other checks) so IMHO it is perfectly OK to run on every commit. You can see the Job in CI here: https://travis-ci.org/PolideaInternal/airflow/jobs/552862055 and a short documentation I wrote on using pre-commits here: https://github.com/PolideaInternal/airflow/blob/pre-commit-hooks/CONTRIBUTING.md#pre-commit-hooks Here are the checks I managed to get running for Airflow (pylint is skipped for now until we finish it but it can be tested in separate job - I also did it in the same build): Docker build - might take longer on setup.py change (see ./.build/cache dir for logs).......................Passed Add licence for all XML, md files...........................................................................Passed Add licence for all JS files................................................................................Passed Add licence for all SQL files...............................................................................Passed Add licence for all JINJA template files....................................................................Passed Add licence for all other files.............................................................................Passed No-tabs checker.............................................................................................Passed Check yaml files with yamllint..............................................................................Passed Check that executables have shebangs........................................................................Passed Check for merge conflicts...................................................................................Passed Check Xml...................................................................................................Passed Debug Statements (Python)...................................................................................Passed Detect Private Key..........................................................................................Passed Fix python encoding pragma..................................................................................Passed Fix End of Files............................................................................................Passed Mixed line ending...........................................................................................Passed Check hooks apply to the repository.........................................................................Passed Add TOC to markdown documents...............................................................................Passed Check Shell scripts syntax corectness.......................................................................Passed Lint dockerfile.............................................................................................Passed Run mypy....................................................................................................Passed Run pylint for main sources................................................................................Skipped Run pylint for tests.......................................................................................Skipped Run flake8..................................................................................................Passed J. J. -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>