Hi Rusty, Le 02/10/2022 à 22:51, Rusty Conover a écrit :
Hi Arrow Team, I'm using Apache Arrow with AWS Lambda Functions. The primary motivation is AWS Athena's user-defined functions[1]. Those functions process and return Arrow IPC segments. * The published Python wheels for Apache Arrow include almost every feature of Arrow. (Gandiva, Plasma, Flight)
Gandiva isn't compiled in the Python wheels. Plasma is reasonably small (but is also being deprecated soon). Flight is more sizable. However, most of the size seems to be in Arrow itself and Parquet. A large part of the size is probably attributable to the Arrow compute engine and functions, and also perhaps to filesystem implementations such as S3 and GCS (due to the large third-party dependencies that they bundle).
Would it be possible to create a new Python package (i.e., "pyarrow-slim") that would disable some of the functionality but result in smaller python wheels?
Perhaps. The first step would be to allow disabling more components in PyArrow, though. Otherwise I'm afraid the size reduction wouldn't be terrific.
Regards Antoine.