And yes - agree that the environmental effect is smaller than "bare" Python
benchmark in our case - but I think it is still there.

There are a number of (valid) cases where people use airflow not only to
purely orchestrate external services, and they are using it run
computationally or logic-intensive tasks and CPU usage is high for airflow
workers, not only scheduler - and in those cases those performance
improvements will be fairly visible.

On Fri, Nov 17, 2023 at 1:45 PM Jarek Potiuk <[email protected]> wrote:

> Yeah. I see the point of Andrey - indeed, we had - for quite some time -
> Python 3.11 exclusion for HDFS providers - until it has been fixed. and we
> already have a built-in mechanism to exclude providers from certain
> versions of Python - it's part of provider.yaml definition and we can deal
> with it easily.
>
> And this is one of the reasons we do not have YET Python 3.12 support -
> because we need to make sure that at least the vast majority of the
> important providers (and all those that are part of the "regular image")
> work with it .
> So I'd say if we stick to those rules - stability of "latest SUPPORTED"
> version  + providers is not impacted. Of course someone might need a
> different provider that has no "latest" support - but that will be clearly
> documented in Provider documentation (automatically) when it happens and
> the user might easily make a deliberate decision to use a different tag
> (and the decision is very easy to turn into practice). Simply - the
> impacted provider will refuse to install and the user will HAVE TO make the
> right call at installation time. So I am not concerned at all about
> "provider support" - this is basically solved by dode
>
> The dill + pendulum case is indeed a bit different - it is an edge case -
> but for me that is an indication that yes, we need to document but also
> more importantly - it seems that our test suite is insufficient - this
> error should have been detected in our unit tests (And at the very least we
> should have PY311 exclusion (and then yes - I concur with the idea of
> having a "known incompatibilities" documentation - possibly even somewhat
> verified automatically with the list of such PY* exclusions we have in our
> test suite.
>
> Andrey - (maybe we can discuss it in the issue you mentioned  - maybe we
> should add such a test case and I am happy to make a PR to propose the
> "Known incompatibilities" page linked to those tests - if that would remove
> all the obstacles for that move.
>
> J.
>
>
> On Fri, Nov 17, 2023 at 9:48 AM Andrey Anshin <[email protected]>
> wrote:
>
>> Personally for me it is controversial change and tradeoff between
>> Stability
>> vs Performance
>>
>> Since Airflow + Providers have 400+ dependencies, using the lowest version
>> of python provides better stability and the reason for this is pretty
>> simple - time spent for maintainers of packages to make it more stable on
>> older Python versions rather than new. That is funny because it is
>> difficult to name 3.11 new one because it was released more than one year
>> ago. However some of core dependencies of Airflow do not updated for a
>> long
>> time and have "formal" support of Python 3.11
>>
>> Known Issue in Airflow and Python 3.11 is dill + pendulum, see:
>> https://github.com/apache/airflow/issues/35307 . It not affect all users,
>> so maybe we could resolve it by a create a "Known Incompatibilities" page
>> in Airflow documentation
>>
>> However, using tags without an explicit python version is a very nice way
>> for users to shoot in the foot or cast wormhole (depending on your
>> preference), especially if we talk about apache/airflow:latest. So in this
>> case I would rather say it doesn't matter which Python version we would
>> use
>> in this case, maybe better even chouse "golden mean" and use Python 3.10
>> but this also controversial because it required to decide when we need to
>> shift this selection in this case select between lowest or highest version
>> it is more straight forward rather than select "golden mean"
>>
>> Performance and environment part is also important however we need to take
>> in account that Airflow is an application which uses DB backend very
>> intensively and this is a point where total performance advantages of
>> changing the Python version are dramatically reduced.
>>
>> As outcome I do not have any objection to this potential changes because
>> there is no any difference between each of strategy of "default" python
>> selection at all of them have pros and cons due to my personal opinion
>> that
>> use Airflow Image without pin python version make more problem rather give
>> any advantages and you might have a nice time to debug at the moment when
>> "default" version changed and it would change in any cases
>>
>> ----
>> Best Wishes
>> *Andrey Anshin*
>>
>>
>>
>> On Fri, 17 Nov 2023 at 10:10, Wei Lee <[email protected]> wrote:
>>
>> > Agreed, as long as users can still use different versions through tags,
>> > there are no drawbacks or incompatibilities with this great idea!
>> >
>> > Best,
>> > Wei
>> >
>> > > On Nov 17, 2023, at 1:39 PM, Aritra Basu <[email protected]>
>> > wrote:
>> > >
>> > > Agreed, moving to latest by default sounds like a fine idea. I don't
>> see
>> > > any drawbacks to it and seems like a good enough time as any to make
>> the
>> > > switch with 2.8.0.
>> > >
>> > > --
>> > > Regards,
>> > > Aritra Basu
>> > >
>> > > On Fri, Nov 17, 2023, 12:33 AM Vincent Beck <[email protected]>
>> wrote:
>> > >
>> > >> I agree, by default we should use the latest python version. Like any
>> > >> package manager, if the user does not explicitly specify a version,
>> the
>> > >> latest should be used. If the user wants to use a lower version, he
>> can
>> > >> always pin it.
>> > >>
>> > >> On 2023/11/16 12:06:17 Jarek Potiuk wrote:
>> > >>> Hello everyone,
>> > >>>
>> > >>> Since we are close to the Airflow 2.8.0 release, I would like to
>> > propose
>> > >> a
>> > >>> change in the approach for our "default" images.
>> > >>>
>> > >>> Currently there are few images that are considered as "default", for
>> > >>> example:
>> > >>>
>> > >>> apache/airflow:latest
>> > >>> apache/airflow:2.7.4
>> > >>>
>> > >>> Currently (according to our process [1] and user documentation [2])
>> > those
>> > >>> point to the "oldest" python version we support (currently they
>> point
>> > to
>> > >>> Python 3.8).
>> > >>>
>> > >>> There is no particular reason why it is like that, and with Airflow
>> > 2.8.0
>> > >>> we have an opportunity to change it and point the default images to
>> > >> "latest
>> > >>> supported" (and keep this version as default for the whole MINOR
>> line
>> > of
>> > >>> releases.
>> > >>>
>> > >>> In the case of Airflow 2.8.* - that would be "Python 3.11" being
>> > default
>> > >>> for the whole 2.8.* line unless we manage to get Python 3.12
>> support in
>> > >> our
>> > >>> CI before we release Airflow 2.8.0, then it would be Python 3.12
>> > >>>
>> > >>> We do not have any SemVer promises about that. Users can still
>> choose
>> > to
>> > >>> use the  "2.8.0-python3.8" tag if they want.
>> > >>>
>> > >>> Generally going to 2.8 should always be a deliberate action, so we
>> have
>> > >>> chance to explain in the release notes that if they want to stick to
>> > the
>> > >>> 2.8 release. So they are not "losing" anything, they can have 100%
>> > >>> compatibility by just choosing a different image in their
>> deployment.
>> > >> This
>> > >>> **might** cause a little hassle when they migrate if they find some
>> > >>> incompatibilities, but generally speaking it's a very
>> straightforward
>> > and
>> > >>> simple change - just  adding "-python3.8" to your TAG - whatever
>> > >> deployment
>> > >>> option you have. And our users will have to go through it anyway
>> every
>> > >> time
>> > >>> we drop the old Python version (and this change might be even more
>> > costly
>> > >>> as they have no choice then) - so it changes very little, just
>> shifts
>> > the
>> > >>> time where they will have to do it.
>> > >>>
>> > >>> There are benefits of doing it - for both our users and well,
>> > environment
>> > >>> as well (and I really mean a positive impact on the "world
>> environment"
>> > >> to
>> > >>> be honest. Maybe a little impact - but with Airflow's popularity, it
>> > >> might
>> > >>> make a (small) difference. Python 3.11 is generally 30% faster than
>> > >>> previous versions and using it by default means that 30% less CPU is
>> > >> being
>> > >>> wasted. Also it will mean actual money savings for our users. Also
>> > Python
>> > >>> 3.12 comes with even more performance improvements and keeping up
>> with
>> > >>> those being the "default" is a pretty good idea.
>> > >>>
>> > >>> I cannot think of any other drawbacks of this change.
>> > >>>
>> > >>> WDYT?
>> > >>>
>> > >>> [1] Documented versioning approach:
>> > >>>
>> > >>
>> >
>> https://github.com/apache/airflow#base-os-support-for-reference-airflow-images
>> > >>> [2] User documentation
>> > >>> https://airflow.apache.org/docs/docker-stack/index.html
>> > >>>
>> > >>
>> > >> ---------------------------------------------------------------------
>> > >> To unsubscribe, e-mail: [email protected]
>> > >> For additional commands, e-mail: [email protected]
>> > >>
>> > >>
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [email protected]
>> > For additional commands, e-mail: [email protected]
>> >
>> >
>>
>

Reply via email to