I have a similar problem - using a conda build, following the pyarrow build 
instructions.
It works fine on the build machine, but building and installing the wheel ends 
up missing some libraries (libutf8, for starters). I’m kind of a newbie in this 
regard, could someone spell out how you can do a (minimal) build but include 
all the conda-provided libraries in the wheel?

On 2023/11/27 18:38:55 Akshara Sadheesh wrote:
> Thank you so much for your reply Raul! So I did run the build using the 
> build_venv.sh file. The issue was I think I did not copy over the libarrow.so 
> files from my docker container in the `root/dist/lib` directory. I have added 
> them onto `arrow/python/pyarrow`. 
> 
> After the build finished I copied over the libarrow.so files from 
> `root/dist/lib` in my container to my host machine and added the libarrow.so 
> files to the `arrow/python/pyarrow` folder. This got rid of the missing 
> libarrow.so files error.
> 
> I then added this new pyarrow folder to my lambda layers folder, the 
> deploy.sh script will take care of building out the new environment using 
> codebuild. I am using a managed Ubuntu Standard 6.0 image 
> (https://github.com/aws/aws-codebuild-docker-images/blob/master/ubuntu/standard/6.0/Dockerfile).
>  This uses glibc version 2.35. As much as possible I would like to avoid 
> changing the glibc version for this as it is a managed image.
> 
> Issue:
> 
> The issue is when I add the custom pyarrow to my lambda layers and run the 
> step function I get this error:
> 
> `GLIBC_2.32* not found (required by 
> /opt/python/pyarrow/lib.cpython-310-x86_64-linux-gnu.so 
> <http://lib.cpython-310-x86_64-linux-gnu.so/>)`
> 
> I keep bumping into a glibc version error. This error is present even after 
> modifying the Dockerfile to use the same base image the code build managed 
> image uses with GLIBC 2.35. 
> 
> This is the modified `arrow/python/examples/minimal_build/Dockerfile.ubuntu` 
> used:
> 
> `
> 
> FROM public.ecr.aws/ubuntu/ubuntu:22.04
> 
> ENV DEBIAN_FRONTEND=noninteractive
> 
> RUN apt-get update -y -q && \
> apt-get install -y -q --no-install-recommends \
> apt-transport-https \
> software-properties-common \
> wget && \
> apt-get install -y -q --no-install-recommends \
> build-essential \
> cmake \
> git \
> ninja-build \
> python3.10 \
> python3.10-dev \
> python3.10-venv \
> && \
> apt-get clean && rm -rf /var/lib/apt/lists*
> 
> # Set Python 3.10 as the default Python version
> RUN update-alternatives --install /usr/bin/python3 python3 
> /usr/bin/python3.10 1
> 
> RUN wget https://bootstrap.pypa.io/get-pip.py && \
> python3 get-pip.py && \
> rm get-pip.py
> 
> `
> 
> This is the `arrow/python/examples/minimal_build/build_venv.sh` used:
> 
> 
> `
> 
> #!/usr/bin/env bash
> # Licensed to the Apache Software Foundation (ASF) under one
> # or more contributor license agreements. See the NOTICE file
> # distributed with this work for additional information
> # regarding copyright ownership. The ASF licenses this file
> # to you under the Apache License, Version 2.0 (the
> # "License"); you may not use this file except in compliance
> # with the License. You may obtain a copy of the License at
> #
> # http://www.apache.org/licenses/LICENSE-2.0
> #
> # Unless required by applicable law or agreed to in writing,
> # software distributed under the License is distributed on an
> # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> # KIND, either express or implied. See the License for the
> # specific language governing permissions and limitations
> # under the License.
> 
> set -e
> 
> #----------------------------------------------------------------------
> # Change this to whatever makes sense for your system
> 
> WORKDIR=${WORKDIR:-$HOME}
> MINICONDA=$WORKDIR/miniconda-for-arrow
> LIBRARY_INSTALL_DIR=$WORKDIR/local-libs
> CPP_BUILD_DIR=$WORKDIR/arrow-cpp-build
> ARROW_ROOT=/arrow
> export ARROW_HOME=$WORKDIR/dist
> export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH
> 
> python3 -m venv $WORKDIR/venv
> source $WORKDIR/venv/bin/activate
> 
> git config --global --add safe.directory $ARROW_ROOT
> 
> pip install -r $ARROW_ROOT/python/requirements-build.txt
> 
> #----------------------------------------------------------------------
> # Build C++ library
> 
> mkdir -p $CPP_BUILD_DIR
> pushd $CPP_BUILD_DIR
> 
> cmake -GNinja \
> -DCMAKE_BUILD_TYPE=Release \
> -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
> -DCMAKE_INSTALL_LIBDIR=lib \
> -DCMAKE_UNITY_BUILD=ON \
> -DARROW_BUILD_STATIC=OFF \
> -DARROW_COMPUTE=ON \
> -DARROW_CSV=ON \
> -DARROW_FILESYSTEM=ON \
> -DARROW_JSON=ON \
> $ARROW_ROOT/cpp
> 
> ninja install
> 
> popd
> 
> #----------------------------------------------------------------------
> # Build and test Python library
> pushd $ARROW_ROOT/python
> 
> rm -rf build/ # remove any pesky pre-existing build directory
> 
> export 
> CMAKE_PREFIX_PATH=${ARROW_HOME}${CMAKE_PREFIX_PATH:+:${CMAKE_PREFIX_PATH}}
> export PYARROW_BUILD_TYPE=Release
> export PYARROW_CMAKE_GENERATOR=Ninja
> 
> # You can run either "develop" or "build_ext --inplace". Your pick
> 
> python setup.py build_ext --inplace
> # python setup.py develop
> 
> # pip install -r $ARROW_ROOT/python/requirements-test.txt
> 
> # py.test pyarrow
> 
> `
> 
> 
> 
> I would be very thankful for any help and advice that you can offer.
> 
> Thank you very much,
> 
> Shara
> 
> 
> On 2023/11/22 14:29:49 Raúl Cumplido wrote:
> > Hi Shara,
> > 
> > The example dockerfile installs the base requirements for Ubuntu but
> > then we use the build_venv.sh (or build_conda.sh) to build the Arrow
> > CPP library and then pyarrow [1].
> > 
> > From the error it seems you did not build Arrow CPP as libarrow.so
> > can't be found. Can you try following the recipe on the provided sh
> > file?
> > 
> > Kind regards,
> > Raúl
> > 
> > [1] 
> > https://github.com/apache/arrow/blob/main/python/examples/minimal_build/build_venv.sh
> > 
> > El mar, 21 nov 2023 a las 23:05, Akshara Sadheesh
> > (<sh...@gmail.com>) escribió:
> > >
> > > Hi,
> > >
> > > I have been trying to use the minimal_build for python with the
> > > provided examples Dockerfile.ubuntu for my lambda layers since it has
> > > a 250 MB limit. I am able to run the build and generate a pyarrow
> > > library. However, the library does not contain any shared .so files.
> > > When in use, it says:
> > >
> > > `"Unable to import module 'lambda_function': libarrow.so.1500: cannot
> > > open shared object file: No such file or directory"`
> > >
> > > I modified the Dockerfile to use python 3.10, ubuntu image to 22.04
> > > and set the `--platform linux/x86_64` when building the image to
> > > ensure it is compatible with the lambda architecture.
> > >
> > > I would be very grateful if you could help me with this,
> > >
> > > Thank you!
> > >
> > > Shara
> >

Sent from my iPhone

Reply via email to