This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git
The following commit(s) were added to refs/heads/main by this push:
new 58b897b5f9 Move arrow-pyarrow tests that require `pyarrow` to be
installed into `arrow-pyarrow-testing` crate (#7742)
58b897b5f9 is described below
commit 58b897b5f975ddb331adcb438a3611a6a9776a3a
Author: Andrew Lamb <[email protected]>
AuthorDate: Mon Jul 7 11:36:48 2025 -0400
Move arrow-pyarrow tests that require `pyarrow` to be installed into
`arrow-pyarrow-testing` crate (#7742)
# Which issue does this PR close?
- Related to https://github.com/apache/arrow-rs/issues/7394
- Closes https://github.com/apache/arrow-rs/issues/7736
# Rationale for this change
At its core, if someone isn't using / modifying the pyarrow integration
for arrow-rs they shouldn't have to install / configure python to get
the tests working in `arrow-rs`
- after the change in https://github.com/apache/arrow-rs/pull/7694
Running `cargo test --workspace` now also runs tests that require python
to be setup and the `pyarrow` module to be installed. This is
problematic because:
1. Some people may not have that environment setup
2. Apparently you can not use virtualenvs with py03 in Mac due to
https://github.com/PyO3/pyo3/issues/1741
The second item was very confusing for me while I tried to debug what
going on as I ket getting an error about pyarrow not being installed,
even though it was installed in my `venv`:
```
thread 'test_to_pyarrow' panicked at arrow-pyarrow/tests/pyarrow.rs:43:6:
called `Result::unwrap()` on an `Err` value: PyErr { type: <class
'ModuleNotFoundError'>, value: ModuleNotFoundError("No module named
'pyarrow'"), traceback: None }
```
# What changes are included in this PR?
1. Move the tests that require pyarrow to be installed into
`arrow-pyarrow-testing`, which is not part of the workspace and thus not
run with `cargo test --all`
2. Remove `cargo test --exclude arrow-pyarrow`
3. Add documentation on rationale and hints about running the test
# Frequently Asked Questions
## Why not add ` --exclude arrow-pyarrow` to
`verify_release_candidate.sh`?
While the minimal fix would be to add ` --exclude arrow-pyarrow` to
verify_release_candidate.sh this requires all users of arrow to remember
to add `--exclude arrow-pyarrow` to their tests even if they don't care
about python
## Why not in `pyarrow-arrow-integration-testing` ?
I did not put this test in `pyarrow-arrow-integration-testing` because
that module doesn't compile for me with the stock python install
Somehow python needs to be installed with the ability to make dynamic
libraries that I haven't figured out and don't really want to. It seems
maybe related to https://pyo3.rs/v0.18.1/getting_started#python (thanks
to @Xuanwo for the pointer in https://github.com/PyO3/pyo3/issues/2136 /
https://github.com/apache/opendal/issues/1675)
```
(venv) root@5e8d0406fabe:/arrow-rs/arrow-pyarrow-integration-testing# cargo
test --test pyarrow
warning: `/arrow-rs/arrow-pyarrow-integration-testing/.cargo/config` is
deprecated in favor of `config.toml`
note: if you need to support cargo 1.38 or earlier, you can symlink
`config` to `config.toml`
Compiling target-lexicon v0.13.2
Compiling flatbuffers v25.2.10
Compiling pyo3-build-config v0.24.2
Compiling arrow-ipc v55.2.0 (/arrow-rs/arrow-ipc)
Compiling pyo3-macros-backend v0.24.2
Compiling pyo3-ffi v0.24.2
Compiling pyo3 v0.24.2
Compiling pyo3-macros v0.24.2
Compiling arrow-pyarrow v55.2.0 (/arrow-rs/arrow-pyarrow)
Compiling arrow v55.2.0 (/arrow-rs/arrow)
Compiling arrow-pyarrow-integration-testing v0.1.0
(/arrow-rs/arrow-pyarrow-integration-testing)
error: linking with `cc` failed: exit status: 1
|
= note: "cc" "/tmp/rustc0jx15I/symbols.o" "<43 object files omitted>"
"-Wl,--as-needed" "-Wl,-Bstatic"
"<sysroot>/lib/rustlib/aarch64-unknown-linux-gnu/lib/{libtest-*,libgetopts-*,libunicode_width-*,librustc_std_workspace_std-*}.rlib"
"/arrow-rs/arrow-pyarrow-integration-testing/target/debug/deps/{libarrow-7996898a6777f964.rlib,libarrow_row-63508de6e52f4d4d.rlib,libarrow_pyarrow-8b510eeadc952ad2.rlib,libpyo3-c463c3a2243eeab9.rlib,libmemoffset-836dc1ddd866c614.rlib,libpyo3_ffi-fbf18
[...]
= note: some arguments are omitted. use `--verbose` to show all linker
arguments
= note: /usr/bin/ld:
/arrow-rs/arrow-pyarrow-integration-testing/target/debug/deps/libarrow_pyarrow-8b510eeadc952ad2.rlib(arrow_pyarrow-8b510eeadc952ad2.8xxa5xo5oql7wlj24034o033n.rcgu.o):
in function `<pyo3::instance::Borrowed<pyo3::types::tuple::PyTuple> as
pyo3::call::PyCallArgs>::call_positional':
/root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/pyo3-0.24.2/src/call.rs:213:
undefined reference to `PyObject_Call'
```
# Are there any user-facing changes?
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
If there are any breaking changes to public APIs, please call them out.
---------
Co-authored-by: Copilot <[email protected]>
---
.github/workflows/integration.yml | 5 ++-
.github/workflows/rust.yml | 5 +--
Cargo.toml | 3 ++
arrow-pyarrow-testing/Cargo.toml | 51 ++++++++++++++++++++++
arrow-pyarrow-testing/src/lib.rs | 20 +++++++++
.../tests/pyarrow.rs | 22 ++++++++++
6 files changed, 101 insertions(+), 5 deletions(-)
diff --git a/.github/workflows/integration.yml
b/.github/workflows/integration.yml
index 1b6eeb15dc..0971171929 100644
--- a/.github/workflows/integration.yml
+++ b/.github/workflows/integration.yml
@@ -165,8 +165,9 @@ jobs:
- name: Run Rust tests
run: |
source venv/bin/activate
- cargo test -p arrow-pyarrow
- - name: Run tests
+ cd arrow-pyarrow-testing
+ cargo test
+ - name: Run Python tests
run: |
source venv/bin/activate
cd arrow-pyarrow-integration-testing
diff --git a/.github/workflows/rust.yml b/.github/workflows/rust.yml
index a20575391b..e4ffb10a11 100644
--- a/.github/workflows/rust.yml
+++ b/.github/workflows/rust.yml
@@ -52,7 +52,7 @@ jobs:
# do not produce debug symbols to keep memory usage down
export RUSTFLAGS="-C debuginfo=0"
# PyArrow tests happen in integration.yml.
- cargo test --workspace --exclude arrow-pyarrow
+ cargo test --workspace
# Check workspace wide compile and test with default features for
@@ -84,8 +84,7 @@ jobs:
# do not produce debug symbols to keep memory usage down
export RUSTFLAGS="-C debuginfo=0"
export PATH=$PATH:/d/protoc/bin
- # PyArrow tests happen in integration.yml.
- cargo test --workspace --exclude arrow-pyarrow
+ cargo test --workspace
# Run cargo fmt for all crates
diff --git a/Cargo.toml b/Cargo.toml
index a9b00f9537..1083c9444c 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -55,6 +55,9 @@ members = [
resolver = "2"
exclude = [
+ # arrow-pyarrow-testing is excluded because it requires a Python
interpreter with the pyarrow package installed,
+ # which makes running `cargo test --all` fail if the appropriate Python
environment is not set up.
+ "arrow-pyarrow-testing",
# arrow-pyarrow-integration-testing is excluded because it requires
different compilation flags, thereby
# significantly changing how it is compiled within the workspace, causing
the whole workspace to be compiled from
# scratch this way, this is a stand-alone package that compiles
independently of the others.
diff --git a/arrow-pyarrow-testing/Cargo.toml b/arrow-pyarrow-testing/Cargo.toml
new file mode 100644
index 0000000000..96c20d31bb
--- /dev/null
+++ b/arrow-pyarrow-testing/Cargo.toml
@@ -0,0 +1,51 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# Note this package is not published to crates.io, it is only used for testing
+# the arrow-pyarrow crate in the arrow-rs repository.
+#
+# It is not part of the workspace so that `cargo test --all` does not require
+# a Python interpreter or the pyarrow package to be installed.
+#
+# It is used to run tests that require a Python interpreter and the pyarrow
+# package installed. It is not intended to be used as a library or a standalone
+# application.
+#
+# It is different from `arrow-pyarrow-integration-testing` in that it works
+# with a standard pyarrow installation, rather than building a dynamic library
+# that can be loaded by Python (which requires additional configuraton of the
+# Python environment).
+
+[package]
+name = "arrow-pyarrow-testing"
+description = "Tests for arrow-pyarrow that require only a Python interpreter
and pyarrow installed"
+version = "0.1.0"
+homepage = "https://github.com/apache/arrow-rs"
+repository = "https://github.com/apache/arrow-rs"
+authors = ["Apache Arrow <[email protected]>"]
+license = "Apache-2.0"
+keywords = [ "arrow" ]
+edition = "2021"
+rust-version = "1.81"
+publish = false
+
+
+[dependencies]
+# Note no dependency on arrow, to ensure arrow-pyarrow can be used by itself
+arrow-array = { path = "../arrow-array" }
+arrow-pyarrow = { path = "../arrow-pyarrow" }
+pyo3 = { version = "0.25", default-features = false }
diff --git a/arrow-pyarrow-testing/src/lib.rs b/arrow-pyarrow-testing/src/lib.rs
new file mode 100644
index 0000000000..31b805c573
--- /dev/null
+++ b/arrow-pyarrow-testing/src/lib.rs
@@ -0,0 +1,20 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! This crate exists to provide a test environment for the `arrow-pyarrow`
crate.
+//! It is not intended to be used by itself. See comments in Cargo.toml for
more
+//! details.
\ No newline at end of file
diff --git a/arrow-pyarrow/tests/pyarrow.rs
b/arrow-pyarrow-testing/tests/pyarrow.rs
similarity index 83%
rename from arrow-pyarrow/tests/pyarrow.rs
rename to arrow-pyarrow-testing/tests/pyarrow.rs
index 12e2f97abf..3d3c30cf21 100644
--- a/arrow-pyarrow/tests/pyarrow.rs
+++ b/arrow-pyarrow-testing/tests/pyarrow.rs
@@ -15,6 +15,28 @@
// specific language governing permissions and limitations
// under the License.
+//! Tests pyarrow bindings
+//!
+//! This test requires installing the `pyarrow` python package. If you do not
+//! have this package installed, you will see an error such as the following:
+//!
+//! ```text
+//! PyErr { type: <class 'ModuleNotFoundError'>, value:
ModuleNotFoundError("No module named 'pyarrow'"), traceback: None }
+//! ```
+//!
+//! # Notes
+//!
+//! You can not use a virtual environment to run these tests on MacOS, as it
will
+//! fail to find the pyarrow module due to
<https://github.com/PyO3/pyo3/issues/1741>
+//!
+//! One way to run them is to install the `pyarrow` package in the system
Python,
+//! which might break other packages, so use with caution:
+//!
+//! ```shell
+//! brew install pipx
+//! pip3 install --break-system-packages pyarrow
+//! ```
+
use arrow_array::builder::{BinaryViewBuilder, StringViewBuilder};
use arrow_array::{
Array, ArrayRef, BinaryViewArray, Int32Array, RecordBatch, StringArray,
StringViewArray,