potiuk commented on code in PR #27264: URL: https://github.com/apache/airflow/pull/27264#discussion_r1008030506
########## Dockerfile: ########## @@ -1215,7 +1215,7 @@ ARG ADDITIONAL_PYTHON_DEPS="" # are compatible with the new protobuf version. All the google python client libraries need # to be upgraded to >=2.0.0 in order to able to lift that limitation # https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates -ARG EAGER_UPGRADE_ADDITIONAL_REQUIREMENTS="dill<0.3.3 pyarrow>=6.0.0 protobuf<4.21.0" +ARG EAGER_UPGRADE_ADDITIONAL_REQUIREMENTS="dill<0.3.3 pyarrow>=6.0.0;python_version==\"3.10\" protobuf<4.21.0" Review Comment: Not really. This one is really interesting and what we have is just a workaround to strange `pip` behaviour I noticed. The problem is that `pip` cannot properly resolve pyarrow version ONLY on 3.10 -> 3.7, 3.8, 3.9 are all fine and produce pyarrow > 6. But 3.10 is not. I do not know about 3.11 is (there is no pyarrow wheel compatible for 3.11 yet and trying to compile current version fails in our Docker image). We are waiting for release and I think PyArrow team is close to release 3.11 compatible binary wheels (so the fact it does not compile is nothing strange - it simply is not compatible with 3.11 and they had some difficulties to make it work) - see that comment from yesterday https://github.com/apache/arrow/pull/14499#issuecomment-1293574736 (I am following up the issues in all the dependencies of ours). The reason why I added `==` is that I just checked if it is stil needed (because I hope it will fix itself eventually). It is still the case, so I decided to make it more explicit that this is python 3.10 only. See the comment above - it always been about Python 3.10 being the reason for adding pyarrow lower bound, but I previoulsly added for all the versions - unnecessarily. Once pyarrow has 3.11 compatible version, we might change it to >= 3.10, but I hope we will not have to - because this might be very specific to Python 3.10 dependency quirks. What happens when we do not have this entry is that on Python 3.10 (and only thre) we end-up with an older version of Pyarrow when installing airflow with all extras from the scratch. Why? No idea. But we do - and it fails tests. Since we do not have pyarrow at all as direct dependency (they are all transitive) - we cannot limit it for airflow in install_requires (we do not want to depend on pyarrow). And since it is used transitively by many depenencies, we have no single place to add it. This is likely result of some complex dependency chain/imperfect resolution algorithm. And it is not happening always - only when we choose to install "all" providers at the same time. So this pyarrow>=6.0.0 is really a "helping hand" I give to `pip` to get resolution better. I do not know the root cause of it (and It would take quite some time I think to figure out and make an easy reproducible case so I have not even reported it to `pip` hoping it will be resolved with time - either by newer dependency versions or `pip` resolver improvements. This is a bit different case than dill and protobuf in the same line (as you see they have lower-bound not upper-bount. In case of "dill" and "protobuf" - itis absolutely necessary for eager upgrade, because `pip` resolves direct dependencies when there are eaager upgrade before looking at "extra" requirements and if we do not add those two limitations, eager upgrade will upgrade them to newer versions and then it will conflict with limits that are in extras. In case of pyarrow it is an opposite problem - we WANT pyarrow to be newer (because some of our dependencies need it) but somehow `pip` chosoes older version when resolving so we need to help `pip` to make better decisions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
