[ 
https://issues.apache.org/jira/browse/ARROW-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943279#comment-16943279
 ] 

Wes McKinney commented on ARROW-6776:
-------------------------------------

Our wheel build scripts are found here

https://github.com/apache/arrow/tree/master/python/manylinux1

It's easy to build your own wheels, just follow the README.

We are shipping many optional components that you can turn off and make smaller 
wheels.

The duplicated shared library issue is 
https://issues.apache.org/jira/browse/ARROW-5082. You are welcome to try to 
resolve this. I and my team have decided to not spend time on wheel-related 
issues anymore, but other Arrow community members are welcome to do what they 
wish

> [Python] Need a lite version of pyarrow
> ---------------------------------------
>
>                 Key: ARROW-6776
>                 URL: https://issues.apache.org/jira/browse/ARROW-6776
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>    Affects Versions: 0.14.1
>            Reporter: Haowei Yu
>            Priority: Major
>
> Currently I am building a library packages on top of pyarrow, so I include 
> pyarrow as a dependency and ship it to our customer. However, when our 
> customer installed our packages, it will also install pyarrow and pyarrow's 
> dependency (numpy). However the dependency size is huge. 
> {code:bash}
> (py36env) [hyu@c6x64-hyu-newuser-final-clone connector]$ ls -l --block-size=M 
> /home/hyu/py36env/lib/python3.6/site-packages/pyarrow/ 
> total 186M
> {code}
>  And numpy is around 80MB. Total is more than 250 MB.
> Our customer want to bundle all dependency and run the code inside AWS 
> Lambda, however they hit the size limit and failed to run the code.
> Looking into the pyarrow, I saw multiple .so files are shipped both with and 
> without version suffix, I wonder if you can remove the one of them (either 
> with or without suffix), it will at least reduce the package size by half.
> Further, our library just want to use IPC and read data as record batch, I 
> don't need arrow flight at all (which is the biggest .so file and takes 
> around 100 MB). I wonder if you can push a lite version of the pyarrow so 
> that I can specify lite version as the dependency. Or maybe I need to build 
> my own lite version and push it pypi. However, this approach cause further 
> problem if our customer is using the "fat" version of pyarrow unless you the 
> change the namespace of lite version of pyarrow.
> Another alternative is that I bundle the pyarrow with our library ( copy the 
> whole directory into vendored namespace) and ship it to our customer without 
> specifying pyarrow as a dependency. The advantage of this one is that I can 
> build pyarrow with whatever option/sub-module/libraries I need. However, I 
> tried a lot but failed because pyarrow use absolute import and it will fail 
> to import the script in the new location. 
> Any insight how I should resolve this issue?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to