potiuk commented on a change in pull request #21825:
URL: https://github.com/apache/airflow/pull/21825#discussion_r815319195



##########
File path: dev/TRACKING_BACKTRACKING_ISSUES.md
##########
@@ -0,0 +1,216 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update 
-->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with 
[DocToc](https://github.com/thlorenz/doctoc)*
+
+- [Backtracking issues context](#backtracking-issues-context)
+- [What can we do about it?](#what-can-we-do-about-it)
+- [How to detect it](#how-to-detect-it)
+- [How to track the root cause](#how-to-track-the-root-cause)
+- [Finding candidates manually](#finding-candidates-manually)
+
+# Backtracking issues context
+
+The `pip` tool we are using in Airflow has a long standing problem with 
backtracking kicking in sometimes
+randomly. This is something we have very little control over, because the 
moment when backtracking kicks in
+depends on how many versions of conflicting packages are released in `PyPI` 
and can change completely without
+any change to Airflow. We have `constraint` mechanism to protect our users 
installing Released versions
+and developers making "regular" PRs, however in `main` builds and in PRs that 
change setup.py, this
+backtracking might lead to extremely long (many hours image builds and 
eventually cancelling the
+image build jobs in CI.
+
+An example of such issue is described here: 
https://github.com/pypa/pip/issues/10924.
+
+Unfortunately the problem is that in such cases, it is not possible to figure 
out what caused the
+problem from `pip` output (state as of `pip` 22.0.3).
+
+There are a number of issues in `pip` that describe the issue, and some 
backtracking reasons have been already
+tracked down and fixed by `pip` maintainers, but this is a difficult problem 
to solve and it is likely it
+is going to be with us for a while. Some other opened issues:
+
+Some issues here
+
+* https://github.com/pypa/pip/issues/10884
+* https://github.com/pypa/pip/issues/10235
+* https://github.com/pypa/pip/issues/10417
+* https://github.com/pypa/pip/issues/9254
+* https://github.com/pypa/pip/issues/10788
+
+Also, the PR that might help in a relatively short time is here:
+
+* https://github.com/pypa/pip/pull/10258
+
+# What can we do about it?
+
+Until `pip` gets an improved way of avoiding or detecting and showing the root 
cause of the conflict there
+is unfortunately only a trial-and-error method. We need to track down which 
dependencies have been changed
+recently and try to pinpoint the root cause of the backtracking. This is not 
easy because sometimes
+the root cause of the problem is not at all obvious and relies on some hidden 
limitations and design choices
+of the `pip` backtracking algorithm, which produce a non-obvious problems.
+
+The https://github.com/pypa/pip/issues/10924 is a good example of that.
+
+# How to detect it
+
+Whenever such situation occurs, The `build image` workflow of ours from the 
`main` repository will start to
+get cancelled on timeout.
+
+https://github.com/apache/airflow/actions/workflows/build-images.yml?query=event%3Apush+branch%3Amain
+
+You might see various errors:
+
+```
+#32 3883.7 INFO: pip is looking at multiple versions of NNNN to determine 
which version is compatible with other requirements. This could take a while.
+Error: The operation was canceled.
+```
+
+Or you might see errors about various pip installation problems:
+
+```
+#32 664.1 Collecting Flask-OpenID<2,>=1.2.5
+  #32 664.2   Downloading Flask-OpenID-1.2.5.tar.gz (43 kB)
+  #32 664.2      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.4/43.4 KB 181.6 
MB/s eta 0:00:00
+  #32 664.2   Preparing metadata (setup.py): started
+  #32 664.3   Preparing metadata (setup.py): finished with status 'error'
+  #32 664.3   error: subprocess-exited-with-error
+  #32 664.3
+  #32 664.3   × python setup.py egg_info did not run successfully.
+  #32 664.3   │ exit code: 1
+  #32 664.3   ╰─> [1 lines of output]
+  #32 664.3       error in Flask-OpenID setup command: use_2to3 is invalid.
+  #32 664.3       [end of output]
+```
+
+But important thing is that suddenly the `main` build images stop working 
without any action from our side.
+
+# How to track the root cause
+
+Whenever a conditions occurs which leads to cancelling CI image build, there 
are jobs run in CI.
+"Candidates for pip resolver backtrack triggers". Those jobs will list the 
packages that have been
+updated since the last successful `main` build in the last day.
+
+You need to find the first such failing job from the
+[list](https://github.com/apache/airflow/actions/workflows/build-images.yml?query=event%3Apush+branch%3Amain).
+
+And you should find the list of packages with information which versions and 
when were updated. You will
+also find a command that you can use for tracking the package, similar to:
+
+```shell
+pip install ".[devel_all]" --upgrade --upgrade-strategy eager '
+        '"dill<0.3.3" "certifi<2021.0.0" "google-ads<14.0.1"' 
"package1==N.N.N" "package2==N.N.N" ...
+```
+
+The candidate packages are the ones with `==`. The command attempts to install 
the suspicious packages in
+the version that was correctly installed before and is stored in the current 
constraints.
+
+The process of tracking down which package is the "root cause" looks as 
follows:
+
+1. Checkout the latest main of Airflow
+2. Build the latest image (with constraints): `./breeze build-image --python 
3.7`
+3. Enter breeze `./breeze`
+4. Attempt to run the `pip install` command that was printed in the 
"Candidates ..." job

Review comment:
       ```suggestion
   4. Attempt to run the `pip install` command that was printed in the 
"Candidates ..." step
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to