And yes - agree that the environmental effect is smaller than "bare" Python benchmark in our case - but I think it is still there.
There are a number of (valid) cases where people use airflow not only to purely orchestrate external services, and they are using it run computationally or logic-intensive tasks and CPU usage is high for airflow workers, not only scheduler - and in those cases those performance improvements will be fairly visible. On Fri, Nov 17, 2023 at 1:45 PM Jarek Potiuk <[email protected]> wrote: > Yeah. I see the point of Andrey - indeed, we had - for quite some time - > Python 3.11 exclusion for HDFS providers - until it has been fixed. and we > already have a built-in mechanism to exclude providers from certain > versions of Python - it's part of provider.yaml definition and we can deal > with it easily. > > And this is one of the reasons we do not have YET Python 3.12 support - > because we need to make sure that at least the vast majority of the > important providers (and all those that are part of the "regular image") > work with it . > So I'd say if we stick to those rules - stability of "latest SUPPORTED" > version + providers is not impacted. Of course someone might need a > different provider that has no "latest" support - but that will be clearly > documented in Provider documentation (automatically) when it happens and > the user might easily make a deliberate decision to use a different tag > (and the decision is very easy to turn into practice). Simply - the > impacted provider will refuse to install and the user will HAVE TO make the > right call at installation time. So I am not concerned at all about > "provider support" - this is basically solved by dode > > The dill + pendulum case is indeed a bit different - it is an edge case - > but for me that is an indication that yes, we need to document but also > more importantly - it seems that our test suite is insufficient - this > error should have been detected in our unit tests (And at the very least we > should have PY311 exclusion (and then yes - I concur with the idea of > having a "known incompatibilities" documentation - possibly even somewhat > verified automatically with the list of such PY* exclusions we have in our > test suite. > > Andrey - (maybe we can discuss it in the issue you mentioned - maybe we > should add such a test case and I am happy to make a PR to propose the > "Known incompatibilities" page linked to those tests - if that would remove > all the obstacles for that move. > > J. > > > On Fri, Nov 17, 2023 at 9:48 AM Andrey Anshin <[email protected]> > wrote: > >> Personally for me it is controversial change and tradeoff between >> Stability >> vs Performance >> >> Since Airflow + Providers have 400+ dependencies, using the lowest version >> of python provides better stability and the reason for this is pretty >> simple - time spent for maintainers of packages to make it more stable on >> older Python versions rather than new. That is funny because it is >> difficult to name 3.11 new one because it was released more than one year >> ago. However some of core dependencies of Airflow do not updated for a >> long >> time and have "formal" support of Python 3.11 >> >> Known Issue in Airflow and Python 3.11 is dill + pendulum, see: >> https://github.com/apache/airflow/issues/35307 . It not affect all users, >> so maybe we could resolve it by a create a "Known Incompatibilities" page >> in Airflow documentation >> >> However, using tags without an explicit python version is a very nice way >> for users to shoot in the foot or cast wormhole (depending on your >> preference), especially if we talk about apache/airflow:latest. So in this >> case I would rather say it doesn't matter which Python version we would >> use >> in this case, maybe better even chouse "golden mean" and use Python 3.10 >> but this also controversial because it required to decide when we need to >> shift this selection in this case select between lowest or highest version >> it is more straight forward rather than select "golden mean" >> >> Performance and environment part is also important however we need to take >> in account that Airflow is an application which uses DB backend very >> intensively and this is a point where total performance advantages of >> changing the Python version are dramatically reduced. >> >> As outcome I do not have any objection to this potential changes because >> there is no any difference between each of strategy of "default" python >> selection at all of them have pros and cons due to my personal opinion >> that >> use Airflow Image without pin python version make more problem rather give >> any advantages and you might have a nice time to debug at the moment when >> "default" version changed and it would change in any cases >> >> ---- >> Best Wishes >> *Andrey Anshin* >> >> >> >> On Fri, 17 Nov 2023 at 10:10, Wei Lee <[email protected]> wrote: >> >> > Agreed, as long as users can still use different versions through tags, >> > there are no drawbacks or incompatibilities with this great idea! >> > >> > Best, >> > Wei >> > >> > > On Nov 17, 2023, at 1:39 PM, Aritra Basu <[email protected]> >> > wrote: >> > > >> > > Agreed, moving to latest by default sounds like a fine idea. I don't >> see >> > > any drawbacks to it and seems like a good enough time as any to make >> the >> > > switch with 2.8.0. >> > > >> > > -- >> > > Regards, >> > > Aritra Basu >> > > >> > > On Fri, Nov 17, 2023, 12:33 AM Vincent Beck <[email protected]> >> wrote: >> > > >> > >> I agree, by default we should use the latest python version. Like any >> > >> package manager, if the user does not explicitly specify a version, >> the >> > >> latest should be used. If the user wants to use a lower version, he >> can >> > >> always pin it. >> > >> >> > >> On 2023/11/16 12:06:17 Jarek Potiuk wrote: >> > >>> Hello everyone, >> > >>> >> > >>> Since we are close to the Airflow 2.8.0 release, I would like to >> > propose >> > >> a >> > >>> change in the approach for our "default" images. >> > >>> >> > >>> Currently there are few images that are considered as "default", for >> > >>> example: >> > >>> >> > >>> apache/airflow:latest >> > >>> apache/airflow:2.7.4 >> > >>> >> > >>> Currently (according to our process [1] and user documentation [2]) >> > those >> > >>> point to the "oldest" python version we support (currently they >> point >> > to >> > >>> Python 3.8). >> > >>> >> > >>> There is no particular reason why it is like that, and with Airflow >> > 2.8.0 >> > >>> we have an opportunity to change it and point the default images to >> > >> "latest >> > >>> supported" (and keep this version as default for the whole MINOR >> line >> > of >> > >>> releases. >> > >>> >> > >>> In the case of Airflow 2.8.* - that would be "Python 3.11" being >> > default >> > >>> for the whole 2.8.* line unless we manage to get Python 3.12 >> support in >> > >> our >> > >>> CI before we release Airflow 2.8.0, then it would be Python 3.12 >> > >>> >> > >>> We do not have any SemVer promises about that. Users can still >> choose >> > to >> > >>> use the "2.8.0-python3.8" tag if they want. >> > >>> >> > >>> Generally going to 2.8 should always be a deliberate action, so we >> have >> > >>> chance to explain in the release notes that if they want to stick to >> > the >> > >>> 2.8 release. So they are not "losing" anything, they can have 100% >> > >>> compatibility by just choosing a different image in their >> deployment. >> > >> This >> > >>> **might** cause a little hassle when they migrate if they find some >> > >>> incompatibilities, but generally speaking it's a very >> straightforward >> > and >> > >>> simple change - just adding "-python3.8" to your TAG - whatever >> > >> deployment >> > >>> option you have. And our users will have to go through it anyway >> every >> > >> time >> > >>> we drop the old Python version (and this change might be even more >> > costly >> > >>> as they have no choice then) - so it changes very little, just >> shifts >> > the >> > >>> time where they will have to do it. >> > >>> >> > >>> There are benefits of doing it - for both our users and well, >> > environment >> > >>> as well (and I really mean a positive impact on the "world >> environment" >> > >> to >> > >>> be honest. Maybe a little impact - but with Airflow's popularity, it >> > >> might >> > >>> make a (small) difference. Python 3.11 is generally 30% faster than >> > >>> previous versions and using it by default means that 30% less CPU is >> > >> being >> > >>> wasted. Also it will mean actual money savings for our users. Also >> > Python >> > >>> 3.12 comes with even more performance improvements and keeping up >> with >> > >>> those being the "default" is a pretty good idea. >> > >>> >> > >>> I cannot think of any other drawbacks of this change. >> > >>> >> > >>> WDYT? >> > >>> >> > >>> [1] Documented versioning approach: >> > >>> >> > >> >> > >> https://github.com/apache/airflow#base-os-support-for-reference-airflow-images >> > >>> [2] User documentation >> > >>> https://airflow.apache.org/docs/docker-stack/index.html >> > >>> >> > >> >> > >> --------------------------------------------------------------------- >> > >> To unsubscribe, e-mail: [email protected] >> > >> For additional commands, e-mail: [email protected] >> > >> >> > >> >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: [email protected] >> > For additional commands, e-mail: [email protected] >> > >> > >> >
