hi folks, I would like to highlight some outstanding problems with our packages
1. Our Arrow C++ static libraries are generally unusable. Whenever -DARROW_JEMALLOC=ON or any dependency is built in BUNDLED mode, libarrow.a (or other static libraries) cannot be used for linking. That's because the static library has a dependency on the bundled static wheels which are _not_ packaged with the Arrow static libraries. The preferred solution seems to be ARROW-7605. I demonstrated how this works in https://github.com/apache/arrow/pull/6220 but I need someone to help with the PR to deal with other BUNDLED dependencies. I likely won't be able to complete the PR myself in time for the next release. 2. Our Python packages are unacceptably large On Linux, wheels are now 64MB and after installation take up 218MB. There is an immediate serious problem that has gone unresolved that is easier to fix and a separate structural problem that is more difficult to fix. See the directory listing https://gist.github.com/wesm/57bd99798a2fa23ef3cb5e4b18b5a248 We're duplicating all of the shared libraries inside the wheel and on disk. It's unfortunate that we've allowed this problem for a whole year or more https://issues.apache.org/jira/browse/ARROW-5082 I also recently opened https://issues.apache.org/jira/browse/ARROW-8518 which describes a proposal to create some tools to assist with building "parent" and "child" Python packages. This would enable us to ship components like Flight and Gandiva as separate wheels. This is a large project but one that will ultimately be necessary for the long-term scalability and sustainability of the project. I am not able to personally work on either of these projects in the current release cycle, but I hope that some progress can be made on these since they have lingered on for a long time, and it would be good for us to "put our best foot forward" with the 1.0.0 release. Thanks, Wes
