hi folks,

I would like to highlight some outstanding problems with our packages

1. Our Arrow C++ static libraries are generally unusable.

Whenever -DARROW_JEMALLOC=ON or any dependency is built in BUNDLED
mode, libarrow.a (or other static libraries) cannot be used for
linking. That's because the static library has a dependency on the
bundled static wheels which are _not_ packaged with the Arrow static
libraries.

The preferred solution seems to be ARROW-7605. I demonstrated how this works in

https://github.com/apache/arrow/pull/6220

but I need someone to help with the PR to deal with other BUNDLED
dependencies. I likely won't be able to complete the PR myself in time
for the next release.

2. Our Python packages are unacceptably large

On Linux, wheels are now 64MB and after installation take up 218MB.
There is an immediate serious problem that has gone unresolved that is
easier to fix and a separate structural problem that is more difficult
to fix. See the directory listing

https://gist.github.com/wesm/57bd99798a2fa23ef3cb5e4b18b5a248

We're duplicating all of the shared libraries inside the wheel and on
disk. It's unfortunate that we've allowed this problem for a whole
year or more

https://issues.apache.org/jira/browse/ARROW-5082

I also recently opened

https://issues.apache.org/jira/browse/ARROW-8518

which describes a proposal to create some tools to assist with
building "parent" and "child" Python packages. This would enable us to
ship components like Flight and Gandiva as separate wheels. This is a
large project but one that will ultimately be necessary for the
long-term scalability and sustainability of the project.

I am not able to personally work on either of these projects in the
current release cycle, but I hope that some progress can be made on
these since they have lingered on for a long time, and it would be
good for us to "put our best foot forward" with the 1.0.0 release.

Thanks,
Wes

Reply via email to