[ 
https://issues.apache.org/jira/browse/ARROW-12585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rodrigo Tobar updated ARROW-12585:
----------------------------------
    Attachment: example.tar.gz

> Published apt packages incompatible with pip binary wheels
> ----------------------------------------------------------
>
>                 Key: ARROW-12585
>                 URL: https://issues.apache.org/jira/browse/ARROW-12585
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Packaging, Python
>    Affects Versions: 4.0.0
>            Reporter: Rodrigo Tobar
>            Priority: Major
>         Attachments: example.tar.gz
>
>
> We have a shared library that uses the shared {{libarrow}} and {{libplasma}} 
> plasma libraries. Our shared library is then eventually loaded by a python 
> process where we use also {{pyarrow}}. To avoid compilation of arrow/plasma 
> we are installing the {{libarrow-dev}} and {{libplasma-dev}} apt packages (as 
> per the official [instructions|https://arrow.apache.org/install/]) and the 
> binary wheel of {{pyarrow}}.
> Each method brings its own copy of {{libarrow.so.400}}, and it turns out the 
> two libraries are not equal: the library contained within {{pyarrow}} is 
> compiled most probably with an older gcc version than that installed via apt, 
> which is compiled using the newer CXX11 ABI from stdlibc++. This wouldn't 
> have any visible effects, except that {{std::string}} is used (and maybe more 
> affected types) in some arrow API points. The difference in the ABI used to 
> compile {{libarrow.so.400}} eventually means they contain differently named 
> symbols. 
> Back to our shared library, we load it in a python process. When this 
> happens, and if the {{pyarrow}} has already been imported, then *its* copy of 
> {{libarrow.so.400}} is already in memory, and loading our shared library 
> doesn't load the "apt" copy of {{libarrow.so.400}}. This means our library 
> doesn't trigger the loading of the copy of {{libarrow.so.400}} that it was 
> compiled against, and if our library refers to one of the symbols that has 
> changed name then it fails to load due to this missing symbol.
> I've attached a fairly minimal example: a Dockerfile prepares a system with 
> libarrow-dev from apt and a binary pyarrow wheel from PyPI. It then compiles 
> a shared library against libarrow-dev. The command ran by default by the 
> container is a small test that runs python and loads the example shared 
> library, both with and without loading pyarrow first. When pyarrow is loaded 
> first then a missing symbol error happens and the shared library fails to 
> load.
> I've experienced this in an Ubuntu-based linux distro and against Arrow 
> 4.0.0, but I'd assume this happens in other distros and versions.
> The workaround we are using at the moment is simple: we are installing a 
> pyarrow version that is different from the arrow version installed via apt. 
> We are lucky we can run in this mixed-version, multiple-libraries-loaded 
> scenario, but it might not be for everyone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to