hi folks, Surfacing a JIRA discussion ([4]) to the mailing list for discussion.
The manylinux1 ABI was developed to provide a mechanism for portable Python packages with pre-compiled binary extensions supporting C and C++, including C++11, on a wide variety of Linux distributions without need for distribution-specific packages. This is accomplished using RedHat's devtoolset-2, which performs selecting static linking of symbols from libstdc++ that cause ABI conflicts when used on systems with older standard libraries. The base image for producing these binaries is specified in a Dockerfile [1]. The problem that we are having is that some C++ libraries, notably Google's Abseil C++ library, require a version of glibc that is too new for RHEL5. By building with CentOS6 / RHEL6 as the base image, we would get a new enough glibc (version 2.12). But building against glibc 2.12 would leave behind the RHEL5 folks. There is the in-discussion manylinux2010 standard uses RHEL6 as a base standard, but it is not yet finalized or in production. Some modern C++ projects shipping to Python have already left behind the manylinux1 standard even though their Python binaries claim to implement the standard. Both PyTorch and TensorFlow are tagged as manylinux1 although they have a different ABI. See [2] for example and [3] In my view there are two paths forward, neither perfect: 1) Stick with the manylinux1 ABI and do not use thirdparty libraries requiring newer glibc 2) "Cheat" on manylinux1 by using centos6 instead of centos5 as the base image for the wheel builds. This is what PyTorch is doing Since centos5 / RHEL5 are already past EOL those would be the primary casualties, but I'm not sure how many users would be affected. My guess is that they represent a small minority of our users at this point. RedHat is offering extended support for RHEL5 through end of 2020 but those are probably fairly exceptional cases and unlikely (IMHO) to be working on the bleeding edge of Python data engineering. Personally I would like to go with Option 2 and hope that this particular Python packaging gets sorted out in the next 12-24 months as we've already suffered problems due to TensorFlow and PyTorch's non-conformity with the manylinux1 ABI. Interested in the opinions of others. - Wes [1]: https://github.com/pypa/manylinux/blob/master/docker/Dockerfile-x86_64 [2]: https://github.com/NVIDIA/nvidia-docker/issues/348#issuecomment-288875848 [3]: https://github.com/pypa/manylinux/issues/96 [4]: https://issues.apache.org/jira/browse/ARROW-2461