Thanks for writing this up Valentyn!

I'm curious Jarek, does Airflow take any dependencies on popular libraries
like pandas, numpy, pyarrow, scipy, etc... which users are likely to have
their own dependency on? I think these dependencies are challenging in a
different way than the client libraries - ideally we would support a wide
version range so as not to require users to upgrade those libraries in
lockstep with Beam. However in some cases our dependency is pretty tight
(e.g. the DataFrame API's dependency on pandas), so we need to make sure to
explicitly test with multiple different versions. Does Airflow have any
similar issues?

Thanks!
Brian

On Thu, Aug 25, 2022 at 5:36 PM Valentyn Tymofieiev via dev <
dev@beam.apache.org> wrote:

> Hi Jarek,
>
> Thanks a lot for detailed feedback and sharing the Airflow story, this is
> exactly what I was hoping to hear in response from the mailing list!
>
> 600+ dependencies is very impressive, so I'd be happy to chat more and
> learn from your experience.
>
> On Wed, Aug 24, 2022 at 5:50 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Comment (from a bit outsider)
>>
>> Fantastic document Valentyn.
>>
>> Very, very insightful and interesting. We feel a lot of the same pain in
>> Apache Airflow (actually even more because we have not 20 but 620+
>> dependencies) but we are also a bit more advanced in the way how we are
>> managing the dependencies - some of the ideas you had there are already
>> tested and tried in Airflow, some of them are a bit different but we can
>> definitely share "principles" and we are a little higher in the "supply
>> chain" (i.e. Apache Beam Python SDK is our dependency).
>>
>> I left some suggestions and some comments describing in detail how the
>> same problems look like in Airflow and how we addressed them (if we did)
>> and I am happy to participate in further discussions. I am "the dependency
>> guy" in Airflow and happy to share my experiences and help to work out some
>> problems - and especially help to solve problems coming from using multiple
>> google-client libraries and diamond dependencies (we are just now dealing
>> with similar issue - where likely we will have to do a massive update of
>> several of our clients - hopefully with the involvement of Composer team.
>> And I'd love to be involved in a joint discussion with the google client
>> team to work out some common and expectations that we can rely on when we
>> define our future upgrade strategy for google clients.
>>
>> I will watch it here and be happy to spend quite some time on helping to
>> hash it out.
>>
>> BTW. You can also watch my talk I gave last year at PyWaw about "Managing
>> Python dependencies at Scale"
>> https://www.youtube.com/watch?v=_SjMdQLP30s&t=2549s where I explain the
>> approach we took, reasoning behind it etc.
>>
>> J.
>>
>>
>> On Wed, Aug 24, 2022 at 2:45 AM Valentyn Tymofieiev via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hi everyone,
>>>
>>> Recently, several issues [1-3]  have highlighted outage risks and
>>> developer inconveniences due to  dependency management practices in Beam
>>> Python.
>>>
>>> With dependabot and other tooling  that we have integrated with Beam,
>>> one of the missing pieces seems to be having a clear guideline of how we
>>> should be specifying requirements for our dependencies and when and how we
>>> should be updating them to have a sustainable process.
>>>
>>> As a conversation starter, I put together a retrospective
>>> <https://docs.google.com/document/d/1gxQF8mciRYgACNpCy1wlR7TBa8zN-Tl6PebW-U8QvBk/edit?resourcekey=0-XcHRyFh4KRPkA0GsdUmU3g#>[4]
>>> covering a recent incident and would like to get community opinions on the
>>> open questions.
>>>
>>> In particular, if you have experience managing dependencies for other
>>> Python libraries with rich dependency chains, knowledge of available
>>> tooling or first hand experience dealing with other dependency issues in
>>> Beam, your input would be greatly appreciated.
>>>
>>> Thanks,
>>> Valentyn
>>>
>>> [1] https://github.com/apache/beam/issues/22218
>>> [2] https://github.com/apache/beam/pull/22550#issuecomment-1217348455
>>> [3] https://github.com/apache/beam/issues/22533
>>> [4]
>>> https://docs.google.com/document/d/1gxQF8mciRYgACNpCy1wlR7TBa8zN-Tl6PebW-U8QvBk/edit
>>>
>>

Reply via email to