WillAyd commented on code in PR #45854:
URL: https://github.com/apache/arrow/pull/45854#discussion_r2551137886
##########
.github/workflows/python.yml:
##########
@@ -175,7 +175,7 @@ jobs:
ARROW_BUILD_TESTS: OFF
PYARROW_TEST_LARGE_MEMORY: ON
# Current oldest supported version according to
https://endoflife.date/macos
- MACOSX_DEPLOYMENT_TARGET: 12.0
+ MACOSX_DEPLOYMENT_TARGET: "12.0"
Review Comment:
I enclosed this in parentheses so that it gets evaluated as the string
"12.0" and not the number; otherwise some functionality in Meson was failing to
unpack the major/minor version
Somewhat tangentially it looks like the minimum supported version right now
is 14.0 - worth updating?
##########
python/meson.build:
##########
@@ -0,0 +1,84 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+project(
+ 'pyarrow',
+ 'cython',
+ 'cpp',
+ version: run_command(
+ 'python',
+ '-m',
+ 'setuptools_scm',
+ '--force-write-version-files',
+ check: true,
+ ).stdout().strip(),
+ license: 'Apache-2.0',
+ #license_files: ['../LICENSE.txt'],
Review Comment:
This is going to reflect the same limitation that is being worked around in
https://github.com/apache/arrow/pull/47141
As far as I am aware, Meson supports the inclusion of files like this, but
meson-python has opted for a strict interpretation of PEP-639 that requires the
LICENSE.txt (and similar files) to exist within the python source. See some
upstream discussion at https://github.com/mesonbuild/meson/pull/14387
##########
ci/scripts/python_build.bat:
##########
@@ -110,30 +110,116 @@ ccache -sv
echo "=== Building Python ==="
set PYARROW_BUILD_TYPE=%CMAKE_BUILD_TYPE%
-set PYARROW_BUILD_VERBOSE=1
-set PYARROW_BUNDLE_ARROW_CPP=ON
-set PYARROW_CMAKE_GENERATOR=%CMAKE_GENERATOR%
-set PYARROW_WITH_ACERO=%ARROW_ACERO%
-set PYARROW_WITH_DATASET=%ARROW_DATASET%
-set PYARROW_WITH_FLIGHT=%ARROW_FLIGHT%
-set PYARROW_WITH_GANDIVA=%ARROW_GANDIVA%
-set PYARROW_WITH_GCS=%ARROW_GCS%
-set PYARROW_WITH_HDFS=%ARROW_HDFS%
-set PYARROW_WITH_ORC=%ARROW_ORC%
-set PYARROW_WITH_PARQUET=%ARROW_PARQUET%
-set PYARROW_WITH_PARQUET_ENCRYPTION=%PARQUET_REQUIRE_ENCRYPTION%
-set PYARROW_WITH_SUBSTRAIT=%ARROW_SUBSTRAIT%
-set PYARROW_WITH_S3=%ARROW_S3%
-set ARROW_HOME=%CMAKE_INSTALL_PREFIX%
-set CMAKE_PREFIX_PATH=%CMAKE_INSTALL_PREFIX%
+if %ARROW_ACERO% == ON (
+ set PYARROW_WITH_ACERO=enabled
+) else if %ARROW_ACERO% == OFF (
+ set PYARROW_WITH_ACERO=disabled
+) else (
+ set PYARROW_WITH_ACERO=auto
+)
+if %ARROW_DATASET% == ON (
+ set PYARROW_WITH_DATASET=enabled
+) else if %ARROW_DATASET% == OFF (
+ set PYARROW_WITH_DATASET=disabled
+) else (
+ set PYARROW_WITH_DATASET=auto
+)
+if %ARROW_FLIGHT% == ON (
+ set PYARROW_WITH_FLIGHT=enabled
+) else if %ARROW_FLIGHT% == OFF (
+ set PYARROW_WITH_FLIGHT=disabled
+) else (
+ set PYARROW_WITH_FLIGHT=auto
+)
+if %ARROW_GANDIVA% == ON (
+ set PYARROW_WITH_GANDIVA=enabled
+) else if %ARROW_GANDIVA% == OFF (
+ set PYARROW_WITH_GANDIVA=disabled
+) else (
+ set PYARROW_WITH_GANDIVA=auto
+)
+if %ARROW_GCS% == ON (
+ set PYARROW_WITH_GCS=enabled
+) else if %ARROW_GCS% == OFF (
+ set PYARROW_WITH_GCS=disabled
+) else (
+ set PYARROW_WITH_GCS=auto
+)
+if %ARROW_HDFS% == ON (
+ set PYARROW_WITH_HDFS=enabled
+) else if %ARROW_HDFS% == OFF (
+ set PYARROW_WITH_HDFS=disabled
+) else (
+ set PYARROW_WITH_HDFS=auto
+)
+if %ARROW_ORC% == ON (
+ set PYARROW_WITH_ORC=enabled
+) else if %ARROW_ORC% == OFF (
+ set PYARROW_WITH_ORC=disabled
+) else (
+ set PYARROW_WITH_ORC=auto
+)
+if %ARROW_PARQUET% == ON (
+ set PYARROW_WITH_PARQUET=enabled
+) else if %ARROW_PARQUET% == OFF (
+ set PYARROW_WITH_PARQUET=disabled
+) else (
+ set PYARROW_WITH_PARQUET=auto
+)
+if %PARQUET_REQUIRE_ENCRYPTION% == ON (
+ set PYARROW_WITH_PARQUET_ENCRYPTION=enabled
+) else if %ARROW_ACERO% == OFF (
+ set PYARROW_WITH_PARQUET_ENCRYPTION=disabled
+) else (
+ set PYARROW_WITH_PARQUET_ENCRYPTION=auto
+)
+if %ARROW_SUBSTRAIT% == ON (
+ set PYARROW_WITH_SUBSTRAIT=enabled
+) else if %ARROW_SUBSTRAIT% == OFF (
+ set PYARROW_WITH_SUBSTRAIT=disabled
+) else (
+ set PYARROW_WITH_SUBSTRAIT=auto
+)
+if %ARROW_S3% == ON (
+ set PYARROW_WITH_S3=enabled
+) else if %ARROW_S3% == OFF (
+ set PYARROW_WITH_S3=disabled
+) else (
+ set PYARROW_WITH_S3=auto
+)
+if %CMAKE_BUILD_TYPE% == Release (
+ set MESON_BUILD_TYPE=release
+) else (
+ set MESON_BUILD_TYPE=debug
+)
pushd %SOURCE_DIR%\python
@REM Install Python build dependencies
%PYTHON_CMD% -m pip install --upgrade pip || exit /B 1
%PYTHON_CMD% -m pip install -r requirements-build.txt || exit /B 1
+%PYTHON_CMD% -m pip install build delvewheel || exit /B 1
+
+@REM by default, CMake installs .lib import libs to lib and .dll libs to bin
+@REM delvewheel requires these to be side-by-side to properly vendor
+@REM https://github.com/adang1345/delvewheel/issues/66
+copy %CMAKE_INSTALL_PREFIX%\lib\*.lib %CMAKE_INSTALL_PREFIX%\bin\
Review Comment:
This is pretty hacky but I'm not sure of a way around it...open to any
thoughts.
The crux of the issue is that the CMake module on main will copy the Arrow
libraries from the system into the package. meson-python is more strict about
what you can copy into the package, and I don't believe it allows you to do
that, pushing you instead to use tools like delvewheel (see comprehensive
shared library documentation at
https://mesonbuild.com/meson-python/how-to-guides/shared-libraries.html)
Using delvewheel gets the job _mostly_ done, excepting the fact that the
.lib and .dll files are not in the same location on the system, which
delvewheel expects (see an upstream issue at
https://github.com/adang1345/delvewheel/issues/66)
I tried adding a CMake option of `-DCMAKE_INSTALL_LIBDIR=bin` to place the
.lib files near the .dll files, but that broke the CMake package finding
capabilities. I tried to then do `-DCMAKE_INSTALL_RUNTIMEDIR=lib`, but that
also had unintended side-effects (and required patching of the Arrow CMake
module, which hardcodes that to `bin` in the `ARROW_ADD_LIB` function)
This hack was the only way I could get this to work.
There's also the question of why the Windows CI job requires this in the
first place, which I _think_ has to do with the python_build.bat script
installing Arrow C++ with the VSenv toolchain, while the python_test.bat script
runs in the standard environment that appears to have the MinGW toolchain
active.
##########
python/pyarrow/meson.build:
##########
@@ -0,0 +1,475 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+cython_args = ['--include-dir', meson.current_source_dir()]
+if get_option('buildtype') in ['debug', 'debugoptimized']
+ cython_args += ['--gdb']
+endif
+
+pyarrow_srcs = files(
+ 'src/arrow/python/arrow_to_pandas.cc',
+ 'src/arrow/python/benchmark.cc',
+ 'src/arrow/python/common.cc',
+ 'src/arrow/python/config.cc',
+ 'src/arrow/python/csv.cc',
+ 'src/arrow/python/datetime.cc',
+ 'src/arrow/python/decimal.cc',
+ 'src/arrow/python/extension_type.cc',
+ 'src/arrow/python/filesystem.cc',
+ 'src/arrow/python/gdb.cc',
+ 'src/arrow/python/helpers.cc',
+ 'src/arrow/python/inference.cc',
+ 'src/arrow/python/io.cc',
+ 'src/arrow/python/ipc.cc',
+ 'src/arrow/python/numpy_convert.cc',
+ 'src/arrow/python/numpy_init.cc',
+ 'src/arrow/python/numpy_to_arrow.cc',
+ 'src/arrow/python/pyarrow.cc',
+ 'src/arrow/python/python_test.cc',
+ 'src/arrow/python/python_to_arrow.cc',
+ 'src/arrow/python/udf.cc',
+ 'src/arrow/python/util.cc',
+)
+
+subdir('src/arrow/python')
+
+if get_option('sdist')
+ arrow_compute_dep = []
+else
+ arrow_compute_dep = dependency(
+ 'arrow-compute',
+ 'ArrowCompute',
+ modules: [f'ArrowCompute::arrow_compute_@cmake_suffix@'],
+ )
+endif
+
+libpyarrow_deps = [arrow_dep, arrow_compute_dep]
+
+cython_modules = {
+ 'lib': {},
+ '_compute': {'dependencies': arrow_compute_dep},
+ '_csv': {},
+ '_feather': {},
+ '_fs': {},
+ '_json': {},
+ '_pyarrow_cpp_tests': {},
+}
+
+needs_substrait = get_option('substrait').enabled()
+needs_dataset = get_option('dataset').enabled() or needs_substrait
+needs_acero = get_option('acero').enabled() or needs_dataset
+needs_flight = get_option('flight').enabled()
+
+if needs_acero
+ if get_option('sdist')
+ arrow_acero_dep = []
+ else
+ arrow_acero_dep = dependency(
+ 'arrow-acero',
+ 'ArrowAcero',
+ modules: [f'ArrowAcero::arrow_acero_@cmake_suffix@'],
+ )
+ endif
+ libpyarrow_deps += [arrow_acero_dep]
+ cython_modules += {'_acero': {'dependencies': arrow_acero_dep}}
+endif
+
+if get_option('azure').enabled()
+ cython_modules += {'_azurefs': {}}
+endif
+
+if get_option('cuda').enabled()
+ if get_option('sdist')
+ arrow_cuda_dep = []
+ else
+ arrow_cuda_dep = dependency(
+ 'arrow-cuda',
+ 'ArrowCUDA',
+ modules: [f'ArrowCUDA::arrow_cuda_@cmake_suffix@'],
+ )
+ endif
+ libpyarrow_deps += [arrow_cuda_dep]
+ cython_modules += {'_cuda': {'dependencies': arrow_cuda_dep}}
+endif
+
+if needs_dataset
+ if get_option('sdist')
+ arrow_dataset_dep = []
+ else
+ arrow_dataset_dep = dependency(
+ 'arrow-dataset',
+ 'ArrowDataset',
+ modules: [f'ArrowDataset::arrow_dataset_@cmake_suffix@'],
+ )
+ endif
+ libpyarrow_deps += [arrow_dataset_dep]
+ cython_modules += {'_dataset': {'dependencies': arrow_dataset_dep}}
+endif
+
+if needs_flight
+ if get_option('sdist')
+ arrow_flight_dep = []
+ else
+ arrow_flight_dep = dependency(
+ 'arrow-flight',
+ 'ArrowFlight',
+ modules: [f'ArrowFlight::arrow_flight_@cmake_suffix@'],
+ )
+ endif
+ libpyarrow_deps += [arrow_flight_dep]
+endif
+
+if get_option('gandiva').enabled()
+ if get_option('sdist')
+ gandiva_dep = []
+ else
+ gandiva_dep = dependency(
+ 'gandiva',
+ 'Gandiva',
+ modules: [f'Gandiva::gandiva_@cmake_suffix@'],
+ )
+ endif
+ libpyarrow_deps += [gandiva_dep]
+ cython_modules += {'gandiva': {'dependencies': gandiva_dep}}
+endif
+
+if get_option('gcs').enabled()
+ cython_modules += {'_gcsfs': {}}
+endif
+
+if get_option('hdfs').enabled()
+ cython_modules += {'_hdfs': {}}
+endif
+
+if get_option('orc').enabled()
+ cython_modules += {'_orc': {}}
+
+ if needs_dataset
+ cython_modules += {'_dataset_orc': {'dependencies': arrow_dataset_dep}}
+ endif
+endif
+
+if get_option('s3').enabled()
+ cython_modules += {'_s3fs': {}}
+endif
+
+if needs_substrait
+ if get_option('sdist')
+ arrow_substrait_dep = []
+ else
+ arrow_substrait_dep = dependency(
+ 'arrow-substrait',
+ 'ArrowSubstrait',
+ modules: [f'ArrowSubstrait::arrow_substrait_@cmake_suffix@'],
+ )
+ endif
+ libpyarrow_deps += [arrow_substrait_dep]
+ cython_modules += {'_substrait': {'dependencies': arrow_substrait_dep}}
+endif
+
+needs_parquet = get_option('parquet').enabled()
+needs_parquet_encryption = get_option('parquet_require_encryption').enabled()
+if needs_parquet_encryption and not needs_parquet
+ warning(
+ '''
+ Building PyArrow with Parquet Encryption is requested, but Parquet
+ itself is not enabled. Ignoring the Parquet Encryption setting.,
+ ''',
+ )
+ needs_parquet_encryption = false
+endif
+
+if needs_parquet
+ if get_option('sdist')
+ parquet_dep = []
+ else
+ parquet_dep = dependency(
+ 'parquet',
+ 'Parquet',
+ modules: [f'Parquet::parquet_@cmake_suffix@'],
+ )
+ endif
+ libpyarrow_deps += [parquet_dep]
+ cython_modules += {'_parquet': {'dependencies': parquet_dep}}
+
+ if needs_dataset
+ cython_modules += {
+ '_dataset_parquet': {
+ 'dependencies': [parquet_dep, arrow_dataset_dep],
+ },
+ }
+ endif
+endif
+
+gnu_symbol_visibility = host_machine.system() == 'darwin' ? 'default' :
'inlineshidden'
Review Comment:
On my local box using `inlineshidden` reduces the overall size of shared
libraries by 5%. On macOS it was segfaulting, which from cursory research has
to do with how the PyMod_INIT function is exposed
##########
python/pyarrow/tests/util.py:
##########
@@ -178,6 +178,7 @@ def get_modified_env_with_pythonpath():
else:
new_pythonpath = module_path
env['PYTHONPATH'] = new_pythonpath
+ env['MACOSX_DEPLOYMENT_TARGET'] = "14.0"
Review Comment:
The cython tests that were running setup.py in a subprocess and linking
against the pyarrow-exposed arrow libraries were failing on macOS without this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]