TL;DR:  I would like to make a proposal to add (easily managed) pre-commit
hooks with various lint checkers part of recommended (and encouraged)
development workflow for Airflow. This will help with speeding-up local
development cycle and decrease pressure on Travis CI.

Over the last few months in our other oozie-to-airflow
<https://github.com/GoogleCloudPlatform/oozie-to-airflow> project we've
been running pre-commit hooks - we've integrated the fantastic
https://pre-commit.com/ framework and we never looked back. The upcoming
changes to docker images (AIP-10) makes it easy to standardise pylint/mypy
and other checks to be the same for every developer (using docker images)
so it also makes possible to have consistent pre-commit environment for
static checks for everyone.

Pre-commit framework has a number of cool features you would expect from
such tool:

- it is super easy to install and manage the checks ('pre-commit install')
- all the checks are managed in single .pre-commit-config.yaml file (in
repo)
- it allows to run various checks only on the changes you are committing
- it has nice and simple UI to run checks individually or skip some checks,
or run on all files
- it automatically uses all processors on your machine (it will distribute
changed files evenly)
- it has great UI for reporting what's going on
- it has a number of pre-defined checks that are super-useful
   - check if you have not left debug instructions (Ash :) - no more
problems with that!)
   - check if you resolved merge conflicts
   - check if you have not submitted a private key
   - check correctness of various files (shell, yaml, xml, whether shebangs
are added correctly etc.)
   - ....
- it can even correct some of the problems found (it leaves corrected files
for you to commit):
   - automatically adding missing licences
   - automatically add python-encoding pragma
   - add Table of Contents for markdown files
   - and many more
- it can be integrated in CI to run the very same checks but on all files
during CI.
- you can use --no-verify switch to skip pre-commit

There are multiple advantages of using pre-commit hooks. It not only speeds
up local development (no more waiting for Travis) but if most people will
start using it - it will also decrease the pressure on Travis - less people
will be submitting PRs just to check if they pass all the checks. I think
that might be single biggest thing that we can do to have far less CPU used
by Airflow builds.

There is one disadvantage I noticed - when you have unstaged changes,
pre-commit stashes them before running checks and sometimes (if you Ctrl^C
very quickly) it will not stash them back - but it is recoverable it just
needs one liner to recover. In normal circumstances it is really safe to
run.

I already made a working POC of pre-commit (not intended for merge - it has
many more unrelated changes that I am going to submit as separate PRs). It
takes less than 20-30 seconds to run all the checks in a small commit
(including pylint/mypy and other checks) so IMHO it is perfectly OK to run
on every commit. You can see the Job in CI here:
https://travis-ci.org/PolideaInternal/airflow/jobs/552862055 and a short
documentation I wrote on using pre-commits here:
https://github.com/PolideaInternal/airflow/blob/pre-commit-hooks/CONTRIBUTING.md#pre-commit-hooks

Here are the checks I managed to get running for Airflow (pylint is skipped
for now until we finish it but it can be tested in separate job - I also
did it in the same build):

      Docker build - might take longer on setup.py change (see
./.build/cache dir for logs).......................Passed
Add licence for all XML, md
files...........................................................................Passed
Add licence for all JS
files................................................................................Passed
Add licence for all SQL
files...............................................................................Passed
Add licence for all JINJA template
files....................................................................Passed
Add licence for all other
files.............................................................................Passed
No-tabs 
checker.............................................................................................Passed
Check yaml files with
yamllint..............................................................................Passed
Check that executables have
shebangs........................................................................Passed
Check for merge
conflicts...................................................................................Passed
Check 
Xml...................................................................................................Passed
Debug Statements
(Python)...................................................................................Passed
Detect Private 
Key..........................................................................................Passed
Fix python encoding
pragma..................................................................................Passed
Fix End of 
Files............................................................................................Passed
Mixed line 
ending...........................................................................................Passed
Check hooks apply to the
repository.........................................................................Passed
Add TOC to markdown
documents...............................................................................Passed
Check Shell scripts syntax
corectness.......................................................................Passed
Lint 
dockerfile.............................................................................................Passed
Run 
mypy....................................................................................................Passed
Run pylint for main
sources................................................................................Skipped
Run pylint for 
tests.......................................................................................Skipped
Run 
flake8..................................................................................................Passed


J.

J.

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to