Comment (from a bit outsider) Fantastic document Valentyn.
Very, very insightful and interesting. We feel a lot of the same pain in Apache Airflow (actually even more because we have not 20 but 620+ dependencies) but we are also a bit more advanced in the way how we are managing the dependencies - some of the ideas you had there are already tested and tried in Airflow, some of them are a bit different but we can definitely share "principles" and we are a little higher in the "supply chain" (i.e. Apache Beam Python SDK is our dependency). I left some suggestions and some comments describing in detail how the same problems look like in Airflow and how we addressed them (if we did) and I am happy to participate in further discussions. I am "the dependency guy" in Airflow and happy to share my experiences and help to work out some problems - and especially help to solve problems coming from using multiple google-client libraries and diamond dependencies (we are just now dealing with similar issue - where likely we will have to do a massive update of several of our clients - hopefully with the involvement of Composer team. And I'd love to be involved in a joint discussion with the google client team to work out some common and expectations that we can rely on when we define our future upgrade strategy for google clients. I will watch it here and be happy to spend quite some time on helping to hash it out. BTW. You can also watch my talk I gave last year at PyWaw about "Managing Python dependencies at Scale" https://www.youtube.com/watch?v=_SjMdQLP30s&t=2549s where I explain the approach we took, reasoning behind it etc. J. On Wed, Aug 24, 2022 at 2:45 AM Valentyn Tymofieiev via dev < dev@beam.apache.org> wrote: > Hi everyone, > > Recently, several issues [1-3] have highlighted outage risks and > developer inconveniences due to dependency management practices in Beam > Python. > > With dependabot and other tooling that we have integrated with Beam, one > of the missing pieces seems to be having a clear guideline of how we should > be specifying requirements for our dependencies and when and how we should > be updating them to have a sustainable process. > > As a conversation starter, I put together a retrospective > <https://docs.google.com/document/d/1gxQF8mciRYgACNpCy1wlR7TBa8zN-Tl6PebW-U8QvBk/edit?resourcekey=0-XcHRyFh4KRPkA0GsdUmU3g#>[4] > covering a recent incident and would like to get community opinions on the > open questions. > > In particular, if you have experience managing dependencies for other > Python libraries with rich dependency chains, knowledge of available > tooling or first hand experience dealing with other dependency issues in > Beam, your input would be greatly appreciated. > > Thanks, > Valentyn > > [1] https://github.com/apache/beam/issues/22218 > [2] https://github.com/apache/beam/pull/22550#issuecomment-1217348455 > [3] https://github.com/apache/beam/issues/22533 > [4] > https://docs.google.com/document/d/1gxQF8mciRYgACNpCy1wlR7TBa8zN-Tl6PebW-U8QvBk/edit >