> I think you need to add: > > export PYARROW_WITH_DATASET=1 This worked, thanks. I think the documentation [1] may need be fixed to clarify that DATASET is also an optional component.
[1] https://arrow.apache.org/docs/developers/python.html#build-and-test Yaron. ________________________________ From: Yaron Gvili <rt...@hotmail.com> Sent: Tuesday, May 10, 2022 1:24 PM To: dev@arrow.apache.org <dev@arrow.apache.org> Subject: Re: PyArrow builds but fails to load pyarrow._dataset > Does `import pyarrow` work? Yes. Also, all but one unit test succeeded: ========================================================================================= short test summary info ========================================================================================== FAILED pyarrow/tests/parquet/test_dataset.py::test_partitioned_dataset[True] - ModuleNotFoundError: No module named 'pyarrow._dataset' ============================================================= 1 failed, 3382 passed, 834 skipped, 17 xfailed, 2 xpassed, 14 warnings in 44.92s ============================================================= Yaron. ________________________________ From: Antoine Pitrou <anto...@python.org> Sent: Tuesday, May 10, 2022 1:17 PM To: dev@arrow.apache.org <dev@arrow.apache.org> Subject: Re: PyArrow builds but fails to load pyarrow._dataset Le 10/05/2022 à 19:16, Antoine Pitrou a écrit : > > That said, tests which require should be skipped gracefully instead of > failing. Oops... some words got swallowed: tests which require *the dataset module* should be skipped gracefully instead of failing. > > > Le 10/05/2022 à 19:13, Weston Pace a écrit : >> I think you need to add: >> >> export PYARROW_WITH_DATASET=1 >> >> On Tue, May 10, 2022 at 7:07 AM Yaron Gvili <rt...@hotmail.com> wrote: >>> >>> Hello, >>> >>> I ran into a problem with running PyArrow that I locally built. The build >>> worked fine (or so it seems) but then the testing procedure had a failure >>> due to not being able to load pyarrow._dataset, which I manually confirmed. >>> I'd appreciate any guidance on how to fix this error. >>> >>> Below are the commands I used to build and test along with the failure >>> console-output (other console-output, for successful commands, is not >>> included), followed by my manual confirmation: >>> >>> $ conda activate pyarrow-dev >>> $ mkdir -p arrow/cpp/build/pyarrow-release >>> $ pushd arrow/cpp/build/pyarrow-release >>> $ cmake -GNinja -DCMAKE_INSTALL_PREFIX=$ARROW_HOME >>> -DCMAKE_INSTALL_LIBDIR=lib $(for a in COMPUTE DATASET ENGINE FILESYSTEM IPC >>> PARQUET PYTHON WITH_BZ2 WITH_ZLIB WITH_ZSTD WITH_LZ4 WITH_SNAPPY >>> WITH_BROTLI BUILD_TESTS; do echo "-DARROW_${a}=ON"; done) >>> -DPARQUET_REQUIRE_ENCRYPTION=ON ../.. >>> $ ninja -j 6 >>> $ cmake --build . --target install >>> $ popd >>> $ pushd arrow/python >>> $ export PYARROW_WITH_PARQUET=1 >>> $ export PYARROW_WITH_PARQUET_ENCRYPTION=1 >>> $ python setup.py build_ext --inplace >>> $ python -m pytest pyarrow/ >>> ... >>> FAILED >>> pyarrow/tests/parquet/test_dataset.py::test_partitioned_dataset[True] - >>> ModuleNotFoundError: No module named 'pyarrow._dataset' >>> ... >>> $ python >>> Python 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:25:59) >>> [GCC 10.3.0] on linux >>> Type "help", "copyright", "credits" or "license" for more information. >>>>>> import pyarrow._dataset >>> Traceback (most recent call last): >>> File "<stdin>", line 1, in <module> >>> ModuleNotFoundError: No module named 'pyarrow._dataset' >>> >>> >>> Cheers, >>> Yaron.